0Pricing
Machine Learning Academy · Lesson

Key Hyperparameters: Learning Rate, n_estimators, and max_depth

Learners will trace the effect of shrinkage (learning rate), ensemble size, and depth on bias-variance trade-off through systematic experiments.

The Holy Trinity of Gradient Boosting

Gradient boosting has three hyperparameters that interact most strongly and have the biggest impact on model performance: learning_rate (how large each tree's contribution is), n_estimators (how many trees to build), and max_depth (how complex each individual tree can be). Understanding how these three interact is the key to effectively tuning any gradient boosting model — whether scikit-learn's GradientBoostingClassifier, XGBoost, or LightGBM.

Learning Rate (Shrinkage) in Depth

The learning rate η (eta) scales each tree's contribution: F_m(x) = F_{m-1}(x) + η × h_m(x). A smaller η means each tree corrects only a fraction of the residual, requiring more trees to converge but producing a smoother, more regularised prediction surface. Typical values: 0.01-0.3. The inverse relationship between learning_rate and n_estimators is crucial: halving the learning rate requires roughly doubling n_estimators to reach the same training loss. Always use early stopping to find the right n_estimators for your chosen learning rate.

from sklearn.ensemble import GradientBoostingClassifier
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import cross_val_score

X, y = load_breast_cancer(return_X_y=True)
# Inverse relationship between lr and n_estimators
configs = [(0.3, 50), (0.1, 150), (0.05, 300), (0.01, 1000)]
for lr, n in configs:
    model = GradientBoostingClassifier(learning_rate=lr, n_estimators=n, max_depth=3, random_state=42)
    score = cross_val_score(model, X, y, cv=5).mean()
    print(f'lr={lr:.2f}, n={n:4d}: CV={score:.4f}')

All lessons in this course

  1. Boosting Intuition: Sequential Error Correction
  2. XGBoost: Regularisation, Early Stopping, and Feature Importance
  3. LightGBM: Leaf-Wise Growth and Speed Advantages
  4. Key Hyperparameters: Learning Rate, n_estimators, and max_depth
← Back to Machine Learning Academy