Learn AI with Python · Lesson

Decision Trees: Theory and Implementation

Gini impurity, information gain, tree depth, overfitting — sklearn DecisionTreeClassifier.

What Is a Decision Tree

A decision tree splits the data into branches based on feature values, asking yes/no questions until it reaches a prediction at a leaf node.

Each internal node tests one feature, each branch is an outcome, and each leaf assigns a class. Trees are easy to interpret because you can follow the path of decisions.

Gini Impurity

Gini impurity measures how mixed the classes are in a node. A pure node (all one class) has Gini 0.

The formula is Gini = 1 - sum(p_i^2) where p_i is the fraction of class i. The tree picks splits that reduce impurity the most.

import numpy as np

def gini(labels):
    classes, counts = np.unique(labels, return_counts=True)
    probs = counts / counts.sum()
    return 1 - np.sum(probs ** 2)

print(gini([0, 0, 1, 1]))   # 0.5 (max mix)
print(gini([0, 0, 0, 0]))   # 0.0 (pure)

All lessons in this course

Decision Trees: Theory and Implementation
Random Forests and Bagging
Gradient Boosting: GBM and XGBoost
LightGBM and CatBoost

← Back to Learn AI with Python