Why GPUs Need Batching
Keep the GPU busy by grouping requests.
A GPU Is a Wide Machine
A GPU has thousands of cores built to do the same math on many items at once. Feed it one input and almost all of those cores sit idle. 🐢
One Request Wastes It
Serving a single prediction per call barely touches the hardware. The GPU spends more time waiting than computing, so your expensive card is mostly idle.