Machine Learning Academy · Lesson

Versioning Models: Why Filenames and Metadata Matter

Learners will design a naming convention that embeds training date, dataset version, and metric score, and write a JSON metadata sidecar for governance.

The Problem Without Versioning

Without a disciplined versioning strategy, teams quickly accumulate model.pkl, model_v2.pkl, model_final.pkl, model_FINAL_v2.pkl files with no record of which was trained on what data, which metric it achieved, or which is actually running in production. This chaos leads to deploying stale models, losing the best checkpoint, or being unable to reproduce a past result for debugging.

What to Encode in the Filename

A good model filename should encode enough context to be self-describing: dataset, algorithm, date, and optionally the primary metric and the dataset version or git SHA. This makes the model registry folder a readable audit trail at a glance, without needing to open each file.

from datetime import date

def model_filename(dataset, algorithm, metric_name, metric_value,
                   data_version='v1', ext='joblib'):
    today = date.today().strftime('%Y%m%d')
    metric_str = f'{metric_name}{int(metric_value * 100)}'
    return f'{dataset}__{algorithm}__{today}__dv{data_version}__{metric_str}.{ext}'

# Examples
print(model_filename('titanic', 'rf', 'acc', 0.834, data_version='2'))
print(model_filename('fraud', 'xgb', 'auc', 0.971))
print(model_filename('cancer', 'logreg', 'f1', 0.955))

All lessons in this course

Saving Models with joblib and pickle
Versioning Models: Why Filenames and Metadata Matter
Serving Predictions with a FastAPI Endpoint
Monitoring Predictions: Logging Inputs and Outputs

← Back to Machine Learning Academy