Building a Tree: Splits, Nodes, and Leaves
Learners will trace how a decision tree recursively partitions data at each node, from root to leaf, and make predictions by following branches.
What Is a Decision Tree?
A decision tree is a flowchart-like structure where each internal node asks a yes/no question about one feature, each branch represents an answer, and each leaf node contains a prediction. To classify a new sample, you start at the root, follow branches according to feature values, and arrive at a leaf whose label is the prediction. Decision trees are interpretable by design — you can trace exactly why any prediction was made by reading the sequence of questions answered, which makes them popular in regulated industries like finance and healthcare.
# Conceptual tree for predicting loan default:
#
# Is income > 50000?
# |--- Yes: Is credit_score > 700?
# | |--- Yes: APPROVE (leaf)
# | |--- No: Is debt_ratio < 0.4?
# | |--- Yes: APPROVE (leaf)
# | |--- No: REJECT (leaf)
# |--- No: REJECT (leaf)
print('Decision tree makes predictions by asking questions')
print('Each path from root to leaf = one decision rule')Nodes, Branches, and Leaves
A decision tree has three types of components: root node (the first question asked — the most informative split of the entire dataset), internal nodes (intermediate questions that further partition subsets of the data), and leaf nodes (terminal nodes where predictions are stored). Each internal node splits data into two or more subsets based on a feature threshold. The depth of a tree is the length of the longest path from root to any leaf. Deeper trees can represent more complex patterns but are more prone to overfitting.
# Tree anatomy example
print('Root node: first split on most informative feature')
print('Internal nodes: further splits on subsets')
print('Leaf nodes: final predictions')
print()
print('Depth=1 tree (stump): one question, two leaves')
print('Depth=2 tree: up to three questions, four leaves')
print('Depth=d tree: up to 2^d leaves')
print()
print('More depth = more flexible but higher overfitting risk')All lessons in this course
- Building a Tree: Splits, Nodes, and Leaves
- Gini Impurity and Information Gain
- Controlling Tree Depth to Prevent Overfitting
- Visualising and Interpreting Decision Trees