The Confusion Matrix Explained
Learners will build a confusion matrix from predictions and ground truth, then derive true positives, false negatives, and related counts.
Beyond Accuracy: Understanding Errors
Accuracy tells you the percentage of correct predictions, but it treats all errors as equal. In reality, different types of errors have very different consequences. A spam filter that sends a legitimate email to the spam folder is annoying; a medical diagnosis model that misses a cancer case is life-threatening.
The confusion matrix is the tool that breaks down classifier performance into a 2×2 table (for binary classification) showing every possible combination of actual and predicted labels. It reveals not just how much the model is wrong but how it is wrong.
The Four Cells of the Confusion Matrix
For binary classification with classes Positive (1) and Negative (0), the confusion matrix has four cells:
- True Positive (TP) — actual positive, predicted positive. Correct detection.
- True Negative (TN) — actual negative, predicted negative. Correct rejection.
- False Positive (FP) — actual negative, predicted positive. Type I error; false alarm.
- False Negative (FN) — actual positive, predicted negative. Type II error; missed detection.
All four counts are needed to fully characterise a classifier's behaviour.
import numpy as np
# Medical test example:
# Positive = disease present, Negative = disease absent
y_true = np.array([1, 0, 1, 1, 0, 0, 1, 0, 1, 0])
y_pred = np.array([1, 0, 1, 0, 1, 0, 1, 0, 0, 0])
TP = ((y_pred == 1) & (y_true == 1)).sum()
TN = ((y_pred == 0) & (y_true == 0)).sum()
FP = ((y_pred == 1) & (y_true == 0)).sum()
FN = ((y_pred == 0) & (y_true == 1)).sum()
print(f'TP={TP} FP={FP}')
print(f'FN={FN} TN={TN}')
print(f'Total: {TP+TN+FP+FN}')