Configure Dynamic Batching in Triton
Tune max batch size and queue delay.
Models Live in a Repository
Triton loads models from a model repository, a folder where each model has its own subfolder holding the weights and a config file.
The config.pbtxt File
Each model gets a config.pbtxt next to its version folder. This text file tells Triton the inputs, outputs, and how to schedule requests.
All lessons in this course
- Why GPUs Need Batching
- Configure Dynamic Batching in Triton
- Run Multiple Model Instances per GPU
- Profile and Tune Inference Latency