Machine Learning Academy · Lesson

Choosing the Right Metric for Your Business Problem

Learners will analyse three real scenarios (fraud detection, medical diagnosis, house pricing) and justify the best evaluation metric for each.

Metrics Are Business Decisions

The choice of evaluation metric is not a purely technical decision — it reflects your business priorities and cost structure. A fraud detection model optimised for accuracy will underperform on detecting fraud in an imbalanced dataset. A medical screening model optimised for precision might miss too many sick patients. Every metric implicitly assumes something about what types of errors are acceptable. Defining the right metric before building the model is one of the most important decisions in any ML project. The metric drives all subsequent choices: model selection, threshold tuning, and deployment criteria.

# The metric choice depends on error costs

# Scenario 1: Email spam filter
# FP (ham marked as spam) -> user misses important email: HIGH cost
# FN (spam reaches inbox) -> user is annoyed: LOW cost
# -> Optimise PRECISION (avoid false alarms)

# Scenario 2: Cancer screening
# FP (healthy patient flagged) -> unnecessary follow-up: LOW cost
# FN (cancer patient missed) -> delayed treatment: HIGH cost
# -> Optimise RECALL (catch all true cases)

# Scenario 3: House price prediction
# No asymmetric cost -> use RMSE or R^2
print('Define your error cost structure BEFORE choosing a metric')

Scenario 1: Fraud Detection

Fraud detection is the classic imbalanced binary classification problem. Legitimate transactions outnumber fraud by 1000:1 or more. A model predicting 'legitimate' for every transaction achieves 99.9% accuracy but is completely useless. The right primary metric depends on business constraints: if fraud losses vastly exceed investigation costs, optimise recall (catch all fraud, even at the cost of many false positives). If customer experience is critical, balance recall and precision. The Precision-Recall curve and AUC-PR are preferred over ROC for such extreme imbalance.

import numpy as np
from sklearn.metrics import (
    accuracy_score, precision_score, recall_score, f1_score,
    average_precision_score, roc_auc_score
)

# Simulated fraud: 1000 transactions, 10 are fraud (1%)
y_true = np.array([0]*990 + [1]*10)

# Model A: always predict legitimate
y_dummy = np.zeros(1000, dtype=int)

print('Dummy model (always legitimate):')
print(f'  Accuracy: {accuracy_score(y_true, y_dummy):.3f}')  # 99% -- misleading!
print(f'  Recall:   {recall_score(y_true, y_dummy):.3f}')    # 0.0 -- catches no fraud!
print()
print('For fraud detection: use Recall, F1, or AUC-PR')

All lessons in this course

Classification Metrics: Accuracy, Precision, Recall, F1
ROC Curves and AUC-ROC
Regression Metrics: MAE, MSE, RMSE, and R-Squared
Choosing the Right Metric for Your Business Problem

← Back to Machine Learning Academy