Load the Model Once at Startup
Use lifespan events so loading happens just once.
The Slow-Endpoint Trap
If you call joblib.load inside /predict, the model reloads on every request. That is slow and wasteful for large models. 🐢
Load It Just Once
The fix: load the model a single time when the service starts, keep it in memory, and reuse it for every prediction.
All lessons in this course
- Your First /predict Endpoint
- Validate Requests with Pydantic
- Load the Model Once at Startup
- Add a /health Readiness Check