Out-of-Bag Error: Free Validation Inside the Forest
Learners will enable oob_score, understand that each tree only sees ~63% of data, and use OOB samples as an unbiased validation set.
What Is Out-of-Bag Error?
When a Random Forest trains each tree on a bootstrap sample, about 37% of training examples are not included in that sample. These out-of-bag (OOB) examples are invisible to the tree during training, so the tree's prediction on them is an honest, held-out prediction. By aggregating OOB predictions across all trees that excluded a given example, we get a nearly unbiased estimate of the model's generalisation error — completely free, with no need for a separate validation split.
How OOB Prediction Is Computed
For each training example x_i, scikit-learn identifies all trees that did not see x_i in their bootstrap sample. Only those trees cast votes for x_i's prediction. The final OOB prediction for x_i is the majority vote (classification) or average (regression) across those trees. This process happens for every training example, giving us an OOB prediction for the full training set. Comparing these predictions to the true labels yields the OOB error.
All lessons in this course
- Bootstrap Aggregation (Bagging) Explained
- Random Feature Selection: The Random Forest Trick
- Out-of-Bag Error: Free Validation Inside the Forest
- Voting Ensembles: Hard Vote vs Soft Vote