Distance Metrics: Euclidean, Manhattan, and Minkowski
Learners will compare distance metrics, understand when Manhattan distance outperforms Euclidean, and pass custom metrics to KNeighborsClassifier.
Why Distance Metrics Matter in KNN
KNN defines nearest neighbors using a distance metric — a mathematical function that quantifies how far apart two points are in feature space. The choice of metric directly affects which neighbors are selected, and therefore what the model predicts. Different metrics make different assumptions about the geometry of the data. Euclidean distance assumes diagonal movement is valid; Manhattan distance only allows axis-aligned movement; cosine similarity ignores magnitude and focuses on direction. No single metric is universally best — the right choice depends on the problem structure.
import numpy as np
A = np.array([0, 0])
B = np.array([3, 4])
# Euclidean: straight-line distance
euclidean = np.sqrt(np.sum((A - B)**2))
print('Euclidean:', euclidean) # 5.0
# Manhattan: sum of absolute differences
manhattan = np.sum(np.abs(A - B))
print('Manhattan:', manhattan) # 7
# Chebyshev: maximum single-axis difference
chebyshev = np.max(np.abs(A - B))
print('Chebyshev:', chebyshev) # 4Euclidean Distance: L2 Norm
Euclidean distance (also called L2 distance or L2 norm) measures the straight-line distance between two points. In 2D it follows the Pythagorean theorem: sqrt(dx^2 + dy^2). In n dimensions: sqrt(sum of squared differences). It is the most intuitive metric and is the default in KNeighborsClassifier. Euclidean distance works well when features are continuous, on a similar scale, and when the notion of diagonal proximity makes physical sense — for instance, geographic coordinates or sensor readings.
import numpy as np
def euclidean(a, b):
return np.sqrt(np.sum((np.array(a) - np.array(b))**2))
# 2D example
print('2D:', euclidean([0, 0], [3, 4])) # 5.0
# 3D example
print('3D:', euclidean([1, 2, 3], [4, 6, 3]).round(2)) # 5.0
# Using scipy for efficiency
from scipy.spatial.distance import euclidean as sp_euclidean
print('scipy:', sp_euclidean([0, 0], [3, 4]))All lessons in this course
- How KNN Works: Distance, Neighbors, and Votes
- Choosing k: The Elbow Method and Validation Curves
- Distance Metrics: Euclidean, Manhattan, and Minkowski
- KNN for Regression and Its Scalability Limits