MLOps Academy · Lesson

Load the Model Once at Startup

Use lifespan events so loading happens just once.

The Slow-Endpoint Trap

If you call joblib.load inside /predict, the model reloads on every request. That is slow and wasteful for large models. 🐢

Load It Just Once

The fix: load the model a single time when the service starts, keep it in memory, and reuse it for every prediction.

All lessons in this course

  1. Your First /predict Endpoint
  2. Validate Requests with Pydantic
  3. Load the Model Once at Startup
  4. Add a /health Readiness Check
← Back to MLOps Academy