Hello everyone,
My setup: Two FastAPI apps calling gRPC ML services (layout analysis + table detection). Need to scale both the services.
Question: For GPU-based ML inference over gRPC, does NGINX load balancing significantly hurt performance vs client-side load balancing?
Main concerns:
- Losing HTTP/2 multiplexing benefits
- Extra latency (though probably negligible vs 2-5s processing time)
- Need priority handling for time-critical clients
Current thinking: NGINX seems simpler operationally, but want to make sure I'm not shooting myself in the foot performance-wise.
Experience with gRPC + NGINX? Client-side LB worth the complexity for this use case?
[link] [comments]






![The Gang Republic: Inside Haiti’s New Order (2026) - ~3 million people living in the grips of all-out gang war. France24 spent a fortnight filming in and around the Haitian capital, speaking to a population held hostage by this drawn-out crisis (CC) [00:52:38]](https://external-preview.redd.it/0j1B98qWy2MAsjLEwjT10EbknBToMVuWRJ-tUeZsTso.jpeg?width=320&crop=smart&auto=webp&s=041d55dee546ef807e7eda2e0d1d013111f02a25)

English (US) ·