Monitoring Prediction Distributions and Confidence Scores
Learners will log prediction probabilities to a time-series store, plot rolling mean confidence, and flag when average confidence drops below a deployment threshold.
Why Monitor Predictions, Not Just Inputs?
Input feature monitoring detects data drift but requires monitoring every feature. Prediction monitoring provides a single integrated signal: if any combination of input changes causes the model to produce different outputs, it will show up in the prediction distribution — even if no individual feature passes its drift threshold. Monitoring predictions is complementary to input monitoring: it catches what input monitoring misses by looking at the model's integrated response to all inputs together.
Logging Predictions in Production
The foundation of prediction monitoring is a prediction log: every inference request's features, predicted label, predicted probabilities, and timestamp stored to a database or file. This log enables retrospective analysis when drift or degradation is detected. Design the schema to include request ID (for joining with ground-truth labels when they arrive), model version, and latency alongside the prediction details.
import datetime
import json
import os
import torch
import torch.nn.functional as F
PREDICTION_LOG = '/tmp/prediction_log.jsonl'
def predict_and_log(features, model, model_version='v1.2.3'):
import numpy as np
with torch.no_grad():
logits = model(torch.tensor(features, dtype=torch.float32).unsqueeze(0))
probs = F.softmax(logits, dim=1).squeeze().numpy()
pred_class = int(probs.argmax())
confidence = float(probs.max())
record = {
'timestamp': datetime.datetime.utcnow().isoformat(),
'model_version': model_version,
'predicted_class': pred_class,
'confidence': round(confidence, 4),
'probabilities': probs.tolist()
}
with open(PREDICTION_LOG, 'a') as f:
f.write(json.dumps(record) + '\n')
return pred_class, confidenceAll lessons in this course
- Data Drift: Feature Distribution Shifts Over Time
- Concept Drift: When the Relationship Between X and Y Changes
- Monitoring Prediction Distributions and Confidence Scores
- Building a Drift Alert Pipeline with Evidently AI