MLOps Academy · Lesson

Load the Model Once at Startup

Use lifespan events so loading happens just once.

The Slow-Endpoint Trap

If you call joblib.load inside /predict, the model reloads on every request. That is slow and wasteful for large models. 🐢

The fix: load the model a single time when the service starts, keep it in memory, and reuse it for every prediction.