MLOps Academy · Lesson

Configure Dynamic Batching in Triton

Tune max batch size and queue delay.

Models Live in a Repository

Triton loads models from a model repository, a folder where each model has its own subfolder holding the weights and a config file.

The config.pbtxt File

Each model gets a config.pbtxt next to its version folder. This text file tells Triton the inputs, outputs, and how to schedule requests.

All lessons in this course

  1. Why GPUs Need Batching
  2. Configure Dynamic Batching in Triton
  3. Run Multiple Model Instances per GPU
  4. Profile and Tune Inference Latency
← Back to MLOps Academy