MLOps Academy · Lesson

Scale to Zero and Back Up

Save cost with request-driven autoscaling.

Idle Models Cost Money

A model pod sitting with no traffic still burns CPU, memory, and cloud bills. KServe can shrink an idle service all the way down. Scale to zero stops that waste. 💸

Powered by Knative

KServe's serverless mode rides on Knative, which watches request volume and adjusts replicas. When traffic stops, it can remove every pod for that service.

All lessons in this course

The InferenceService Resource
Scale to Zero and Back Up
Write a Custom Predictor
KServe vs Seldon Core

← Back to MLOps Academy