Enable Adaptive Micro-Batching
Group requests automatically for higher throughput.
The Throughput Problem
Calling a model once per request wastes hardware. Models run far faster on a batch of inputs than on the same inputs one at a time. ⚡
What Micro-Batching Does
Adaptive batching collects incoming requests for a brief window, runs them together, then splits results back to each caller automatically.
All lessons in this course
- Save a Model to the Bento Store
- Define a Service and Its API
- Enable Adaptive Micro-Batching
- Build a Bento and Containerize It