Scale to Zero and Back Up
Save cost with request-driven autoscaling.
Idle Models Cost Money
A model pod sitting with no traffic still burns CPU, memory, and cloud bills. KServe can shrink an idle service all the way down. Scale to zero stops that waste. 💸
Powered by Knative
KServe's serverless mode rides on Knative, which watches request volume and adjusts replicas. When traffic stops, it can remove every pod for that service.
All lessons in this course
- The InferenceService Resource
- Scale to Zero and Back Up
- Write a Custom Predictor
- KServe vs Seldon Core