Machine Learning Academy · Lesson

Building a Drift Alert Pipeline with Evidently AI

Learners will use the Evidently library to generate a data drift HTML report, integrate the check into a scheduled script, and trigger an email alert on drift detection.

What Is Evidently AI?

Evidently AI is an open-source Python library for evaluating and monitoring machine learning models in production. It provides pre-built reports and test suites covering data drift, data quality, model performance, and prediction drift. With a few lines of code, Evidently generates interactive HTML reports or JSON summaries that can be embedded in monitoring dashboards or sent as email attachments, making it the fastest way to add comprehensive monitoring to an existing ML pipeline.

# Install Evidently
# pip install evidently

import evidently
print('Evidently version:', evidently.__version__)

# Evidently organises monitoring into:
# 1. Reports -- exploratory HTML dashboards
# 2. Test Suites -- pass/fail checks for CI/CD integration
# 3. Metrics -- individual measurements (DataDriftPreset, etc.)

from evidently.report import Report
from evidently.test_suite import TestSuite
from evidently.metric_preset import DataDriftPreset, DataQualityPreset
print('Imports successful.')

Preparing Reference and Current Datasets

Evidently compares a reference dataset (typically the training data or a recent stable production period) against a current dataset (the most recent production batch). Both must be pandas DataFrames with the same column schema. The reference establishes the baseline; the current is checked for drift relative to that baseline. Include the target column if ground-truth labels are available; Evidently will use them for performance metrics.

import pandas as pd
import numpy as np
from sklearn.datasets import make_classification

np.random.seed(42)

# Simulate training data (reference)
X_ref, y_ref = make_classification(n_samples=5000, n_features=8, random_state=42)
reference = pd.DataFrame(X_ref, columns=[f'feature_{i}' for i in range(8)])
reference['target'] = y_ref

# Simulate production batch with some drift
X_cur = X_ref[:1000].copy()
X_cur[:, 2] += 2.0  # shift feature_2
X_cur[:, 5] *= 1.8  # scale feature_5
current = pd.DataFrame(X_cur, columns=[f'feature_{i}' for i in range(8)])
current['target'] = y_ref[:1000]  # labels may or may not be available

print('Reference shape:', reference.shape)
print('Current shape:', current.shape)

All lessons in this course

Data Drift: Feature Distribution Shifts Over Time
Concept Drift: When the Relationship Between X and Y Changes
Monitoring Prediction Distributions and Confidence Scores
Building a Drift Alert Pipeline with Evidently AI

← Back to Machine Learning Academy