Machine Learning Academy · Lesson

Saving and Loading a Pipeline with joblib

Learners will serialise a fitted Pipeline to disk with joblib.dump and reload it in a fresh Python session to make predictions without re-training.

Why Persist a Trained Pipeline?

Training a machine learning pipeline can take minutes or hours. Once fitted, you want to save it to disk so you can reload it later for predictions without re-training. Persistence is also essential for deployment: you train on a development machine and serve predictions on a production server. The saved file must include both the preprocessing steps and the model weights.

Two Serialisation Options: pickle and joblib

Python's built-in pickle module can serialise any Python object, including sklearn pipelines. joblib is a third-party library (bundled with scikit-learn) that is generally preferred for ML objects because it is more efficient for large NumPy arrays — using memory mapping instead of copying — and can compress the output file automatically.

import pickle
import joblib

# Both approaches work; joblib is recommended for sklearn objects
print('pickle version:', pickle.HIGHEST_PROTOCOL)
import sklearn
print('sklearn version:', sklearn.__version__)

All lessons in this course

Building Your First Pipeline: Scaler Plus Classifier
ColumnTransformer Inside a Pipeline
Cross-Validating and Grid-Searching a Full Pipeline
Saving and Loading a Pipeline with joblib

← Back to Machine Learning Academy