Concept Drift: When the Relationship Between X and Y Changes
Learners will distinguish data drift from concept drift using a time-series example, and understand why data drift does not always imply model performance degradation.
What Is Concept Drift?
Concept drift occurs when the relationship between input features X and the target label Y changes over time — even if the feature distributions themselves remain stable. In fraud detection, fraudsters learn which transactions are flagged and change their patterns, so transactions that looked fraudulent in 2022 look benign by 2024. The 'concept' being learned (what fraud looks like) has drifted, but the feature distributions may appear unchanged, making concept drift harder to detect than data drift.
Data Drift vs Concept Drift: Key Difference
The critical distinction: data drift is a change in P(X) — the input feature distribution. Concept drift is a change in P(Y|X) — the conditional relationship between features and labels. Data drift does not always degrade model performance (if the new distribution is still within the learned decision boundary). Concept drift always degrades performance because the model's learned mapping from X to Y is now incorrect, regardless of whether X looks the same.
import numpy as np
import pandas as pd
# Scenario: credit scoring model
# Feature: income. Label: 1 = defaults, 0 = repays
# Training period: incomes 30-50k correlate with defaults
np.random.seed(42)
train_income = np.random.normal(40000, 8000, 1000)
train_default = (train_income < 35000).astype(int) # low income -> default
# Concept drift: inflation shifts threshold; now defaults start at 50k
prod_income = np.random.normal(40000, 8000, 500) # SAME distribution (no data drift)
prod_default = (prod_income < 50000).astype(int) # new relationship
print('Train default rate:', train_default.mean().round(3))
print('Prod default rate:', prod_default.mean().round(3))
print('Feature distribution same (no data drift).',
'But Y|X relationship has changed (concept drift).')All lessons in this course
- Data Drift: Feature Distribution Shifts Over Time
- Concept Drift: When the Relationship Between X and Y Changes
- Monitoring Prediction Distributions and Confidence Scores
- Building a Drift Alert Pipeline with Evidently AI