Nested Cross-Validation: Selecting and Evaluating Simultaneously
Learners will structure an outer CV loop for evaluation and an inner loop for hyperparameter selection to get an unbiased estimate of the tuned model's true performance.
The Problem of Evaluation Bias
When you use the same data for both hyperparameter selection and performance evaluation, you introduce an optimistic bias. Even if you use cross-validation for selection and a separate test set for evaluation, if you repeat the selection process multiple times (trying different grids, different models), you are implicitly using the test set information. With a single held-out test set, random chance may cause a lucky hyperparameter combination to look better than it truly is. Nested cross-validation provides an unbiased performance estimate while still selecting hyperparameters.
The Two-Loop Structure of Nested CV
Nested CV uses two nested loops: (1) an outer loop for performance evaluation — it creates multiple train/test splits, and the test split is used only to evaluate the final selected model; (2) an inner loop for hyperparameter selection — within each outer training fold, a second CV (or grid search) is run to select the best hyperparameters using only the outer training data. The outer test fold never participates in training or selection. Averaging the outer scores gives the true, unbiased generalisation estimate.
All lessons in this course
- K-Fold Cross-Validation: Splitting Without Leaking
- Stratified and Time-Series Cross-Validation
- Grid Search vs Random Search
- Nested Cross-Validation: Selecting and Evaluating Simultaneously