Visualising and Interpreting Decision Trees
Learners will export and render a tree with sklearn's plot_tree, read the decision rules, and extract feature importances for stakeholder reports.
Why Tree Visualisation Matters
Decision trees are often called white-box models because their decision logic is fully transparent. Visualising a trained tree lets you: verify the model is making decisions based on sensible features, explain predictions to non-technical stakeholders, identify potential data quality issues (e.g., a feature that should not be important appearing at the root), and debug unexpected behaviour. Visualisation turns the tree's mathematical structure into a human-readable flowchart that domain experts can validate against their knowledge.
from sklearn.tree import DecisionTreeClassifier, plot_tree
from sklearn.datasets import load_iris
import matplotlib.pyplot as plt
X, y = load_iris(return_X_y=True)
feature_names = load_iris().feature_names
class_names = load_iris().target_names
tree = DecisionTreeClassifier(max_depth=3, random_state=42)
tree.fit(X, y)
plt.figure(figsize=(14, 6))
plot_tree(tree,
feature_names=feature_names,
class_names=class_names,
filled=True, # Color by majority class
rounded=True, # Rounded boxes
fontsize=10)
plt.title('Iris Decision Tree (depth=3)')
plt.show()Reading a Node in plot_tree Output
Each node in the plot_tree output shows four pieces of information: (1) The split condition (e.g., petal length <= 2.45), (2) The Gini impurity of the node, (3) The number of samples that reached this node during training, and (4) The class distribution as a list of sample counts per class. Leaf nodes show all four but no split condition — the majority class is the prediction. Node colour intensity indicates purity: darker = more samples of the dominant class.
# Interpreting node output from plot_tree:
#
# petal length (cm) <= 2.45 <- split condition
# gini = 0.667 <- impurity before split
# samples = 150 <- training samples reaching node
# value = [50, 50, 50] <- samples per class [setosa, versicolor, virginica]
# class = setosa <- majority class (prediction if leaf)
print('Gini 0.667 = equal 3-class split (maximum 3-class impurity)')
print('samples=150 at root = all training samples')
print('value=[50,50,50] = perfectly balanced classes')All lessons in this course
- Building a Tree: Splits, Nodes, and Leaves
- Gini Impurity and Information Gain
- Controlling Tree Depth to Prevent Overfitting
- Visualising and Interpreting Decision Trees