Learn AI with Python · Lesson

Loading and Serving ML Models

Loading joblib/keras model at startup, thread-safe inference, batch prediction endpoints.

The Loading Problem

Loading a model from disk is slow. If you reload it on every request, latency skyrockets. The fix: load the model once at startup and reuse it for all requests.

The startup Event

The @app.on_event("startup") hook runs once when the server boots, the perfect place to load the model before any request arrives.

from fastapi import FastAPI
import joblib

app = FastAPI()

@app.on_event("startup")
def load_model():
    app.state.model = joblib.load("model.joblib")

All lessons in this course

FastAPI Basics for ML Engineers
Pydantic Schemas for Request and Response
Loading and Serving ML Models
Dockerizing the Model API

← Back to Learn AI with Python