Decision Trees: Theory and Implementation
Gini impurity, information gain, tree depth, overfitting — sklearn DecisionTreeClassifier.
What Is a Decision Tree
A decision tree splits the data into branches based on feature values, asking yes/no questions until it reaches a prediction at a leaf node.
Each internal node tests one feature, each branch is an outcome, and each leaf assigns a class. Trees are easy to interpret because you can follow the path of decisions.
Gini Impurity
Gini impurity measures how mixed the classes are in a node. A pure node (all one class) has Gini 0.
The formula is Gini = 1 - sum(p_i^2) where p_i is the fraction of class i. The tree picks splits that reduce impurity the most.
import numpy as np
def gini(labels):
classes, counts = np.unique(labels, return_counts=True)
probs = counts / counts.sum()
return 1 - np.sum(probs ** 2)
print(gini([0, 0, 1, 1])) # 0.5 (max mix)
print(gini([0, 0, 0, 0])) # 0.0 (pure)All lessons in this course
- Decision Trees: Theory and Implementation
- Random Forests and Bagging
- Gradient Boosting: GBM and XGBoost
- LightGBM and CatBoost