0Pricing
Machine Learning Academy · Lesson

Bias-Variance Trade-off: Underfitting vs Overfitting

Learners will plot training vs validation error curves, identify under- and overfitting regions, and understand the bias-variance decomposition conceptually.

The Central Tension in ML

Every machine learning model faces a fundamental tension between two competing forces: its ability to capture complex patterns in data (flexibility) and its ability to generalise those patterns to new examples (regularity). Too much flexibility leads to overfitting; too little leads to underfitting. Finding the sweet spot is the core challenge of model selection and hyperparameter tuning.

The bias-variance trade-off gives this tension a mathematical name and framework. Understanding it is essential for diagnosing what is wrong with a model and knowing exactly how to fix it.

Bias: Systematic Error from Wrong Assumptions

Bias is the error that comes from wrong assumptions in the learning algorithm. A high-bias model is too simple to capture the true relationship between features and the target — it makes systematically wrong predictions regardless of how much training data you give it.

The classic example: trying to fit a straight line to data that has a clear non-linear (curved) pattern. No matter how much data you have, the line will miss the curve. The model is underfitting — it has high bias because its linearity assumption is wrong for this problem.

import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

# Non-linear data: y = sin(x) + noise
np.random.seed(42)
X = np.sort(np.random.uniform(0, 10, 200)).reshape(-1, 1)
y = np.sin(X.ravel()) + np.random.randn(200) * 0.2

# High-bias model: straight line on non-linear data
linear_model = LinearRegression()
linear_model.fit(X, y)
y_pred_linear = linear_model.predict(X)

print(f'Linear model train MSE: {mean_squared_error(y, y_pred_linear):.3f}')
# High error even on training data -- high bias

All lessons in this course

  1. Why You Cannot Evaluate on Training Data
  2. train_test_split: Ratios, Seeds, and Stratification
  3. Bias-Variance Trade-off: Underfitting vs Overfitting
  4. Baseline Models: Always Beat the Dummy Classifier
← Back to Machine Learning Academy