Pods, Deployments, and Services for Models
Map ML serving onto core Kubernetes objects.
The Pod Is the Smallest Unit
Kubernetes never runs a bare container. The smallest thing it schedules is a Pod, a wrapper holding one container (your model server) plus its shared network. 📦
One Model Server per Pod
For ML serving you usually put one model API container in each Pod. That keeps scaling, restarts, and resource limits simple to reason about per model.
All lessons in this course
- Pods, Deployments, and Services for Models
- Request CPU, Memory, and GPU
- Configure with ConfigMaps and Secrets
- Run Training as a Kubernetes Job