Loading and Serving ML Models
Loading joblib/keras model at startup, thread-safe inference, batch prediction endpoints.
The Loading Problem
Loading a model from disk is slow. If you reload it on every request, latency skyrockets. The fix: load the model once at startup and reuse it for all requests.
The startup Event
The @app.on_event("startup") hook runs once when the server boots, the perfect place to load the model before any request arrives.
from fastapi import FastAPI
import joblib
app = FastAPI()
@app.on_event("startup")
def load_model():
app.state.model = joblib.load("model.joblib")All lessons in this course
- FastAPI Basics for ML Engineers
- Pydantic Schemas for Request and Response
- Loading and Serving ML Models
- Dockerizing the Model API