Machine Learning Academy · Lesson

Bootstrap Aggregation (Bagging) Explained

Learners will implement bootstrap sampling by hand, train classifiers on each sample, and understand why averaging diverse models reduces variance.

What Is Bootstrap Aggregation?

Bootstrap Aggregation, commonly called Bagging, is an ensemble technique that trains multiple models on different random subsets of the training data and combines their predictions. The word bootstrap comes from statistics and means sampling with replacement from your data. By averaging or voting across many diverse models, bagging dramatically reduces the variance of the final prediction without increasing bias.

Sampling With Replacement Explained

Sampling with replacement means each bootstrap sample is drawn independently from the full dataset — the same row can appear multiple times in a single sample while other rows may be excluded entirely. For a dataset of N examples, each bootstrap sample also contains N rows. On average, about 63.2% of unique examples appear in any given bootstrap sample, and the remaining ~37% form the out-of-bag set that can be used for validation.

import numpy as np

np.random.seed(42)
data = np.arange(10)  # [0, 1, 2, ..., 9]
bootstrap_sample = np.random.choice(data, size=len(data), replace=True)
print('Original:', data)
print('Bootstrap sample:', bootstrap_sample)

All lessons in this course

Bootstrap Aggregation (Bagging) Explained
Random Feature Selection: The Random Forest Trick
Out-of-Bag Error: Free Validation Inside the Forest
Voting Ensembles: Hard Vote vs Soft Vote

← Back to Machine Learning Academy