Machine Learning Academy · Lesson

Visualising and Interpreting Decision Trees

Learners will export and render a tree with sklearn's plot_tree, read the decision rules, and extract feature importances for stakeholder reports.

Why Tree Visualisation Matters

Decision trees are often called white-box models because their decision logic is fully transparent. Visualising a trained tree lets you: verify the model is making decisions based on sensible features, explain predictions to non-technical stakeholders, identify potential data quality issues (e.g., a feature that should not be important appearing at the root), and debug unexpected behaviour. Visualisation turns the tree's mathematical structure into a human-readable flowchart that domain experts can validate against their knowledge.

from sklearn.tree import DecisionTreeClassifier, plot_tree
from sklearn.datasets import load_iris
import matplotlib.pyplot as plt

X, y = load_iris(return_X_y=True)
feature_names = load_iris().feature_names
class_names   = load_iris().target_names

tree = DecisionTreeClassifier(max_depth=3, random_state=42)
tree.fit(X, y)

plt.figure(figsize=(14, 6))
plot_tree(tree,
          feature_names=feature_names,
          class_names=class_names,
          filled=True,      # Color by majority class
          rounded=True,     # Rounded boxes
          fontsize=10)
plt.title('Iris Decision Tree (depth=3)')
plt.show()

Reading a Node in plot_tree Output

Each node in the plot_tree output shows four pieces of information: (1) The split condition (e.g., petal length <= 2.45), (2) The Gini impurity of the node, (3) The number of samples that reached this node during training, and (4) The class distribution as a list of sample counts per class. Leaf nodes show all four but no split condition — the majority class is the prediction. Node colour intensity indicates purity: darker = more samples of the dominant class.

# Interpreting node output from plot_tree:
#
# petal length (cm) <= 2.45     <- split condition
# gini = 0.667                  <- impurity before split
# samples = 150                 <- training samples reaching node
# value = [50, 50, 50]          <- samples per class [setosa, versicolor, virginica]
# class = setosa                <- majority class (prediction if leaf)

print('Gini 0.667 = equal 3-class split (maximum 3-class impurity)')
print('samples=150 at root = all training samples')
print('value=[50,50,50] = perfectly balanced classes')

All lessons in this course

Building a Tree: Splits, Nodes, and Leaves
Gini Impurity and Information Gain
Controlling Tree Depth to Prevent Overfitting
Visualising and Interpreting Decision Trees

← Back to Machine Learning Academy