Machine Learning Academy · Lesson

Data Drift: Feature Distribution Shifts Over Time

Learners will simulate drift by gradually shifting an input feature, compute the Population Stability Index (PSI) and KL divergence, and set threshold-based alerts.

What Is Data Drift?

Data drift (also called covariate shift) occurs when the statistical distribution of input features changes after deployment compared to the distribution seen during training. A fraud detection model trained on 2022 transaction patterns may encounter very different transaction amounts and merchant categories by 2024. The model's learned decision boundaries no longer match the new data distribution, causing silent performance degradation that only becomes visible through monitoring.

Simulating Drift: Gradual Feature Shift

To study drift, we can simulate it by gradually shifting a feature's mean over time. In production, this might represent seasonal changes in user behaviour, economic shifts affecting purchasing power, or evolving fraud patterns. Plotting the feature distribution for each week reveals when the shift becomes statistically significant and should trigger a retraining alert.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

np.random.seed(42)
# Training distribution: income ~ Normal(50000, 10000)
train_income = np.random.normal(50000, 10000, 5000)

# Production weeks 1-12: mean gradually shifts from 50k to 65k
prod_weeks = []
for week in range(1, 13):
    shifted_mean = 50000 + week * 1250  # +1250 per week
    week_data = np.random.normal(shifted_mean, 10000, 500)
    prod_weeks.append({'week': week, 'income': week_data})

print('Training mean:', train_income.mean().round(0))
for pw in [prod_weeks[0], prod_weeks[5], prod_weeks[-1]]:
    print(f'Week {pw["week"]} mean: {pw["income"].mean().round(0)}')

All lessons in this course

Data Drift: Feature Distribution Shifts Over Time
Concept Drift: When the Relationship Between X and Y Changes
Monitoring Prediction Distributions and Confidence Scores
Building a Drift Alert Pipeline with Evidently AI

← Back to Machine Learning Academy