How KNN Works: Distance, Neighbors, and Votes
Learners will visualise a 2D dataset, compute Euclidean distances, identify the k nearest neighbours, and produce a classification by majority vote.
The Core Intuition Behind KNN
K-Nearest Neighbors (KNN) is one of the most intuitive machine learning algorithms: predict the label of a new point by looking at the k closest labelled points and taking a majority vote. There is no explicit training phase — the algorithm simply memorises the training data and performs all computation at prediction time. This makes KNN a lazy learner. It works well when similar inputs have similar outputs, which is a reasonable assumption in many real-world problems like recommending products or diagnosing diseases.
# Conceptual pseudocode
def knn_predict(X_train, y_train, x_new, k=3):
# 1. Compute distance from x_new to every training point
distances = [euclidean(x_new, x_i) for x_i in X_train]
# 2. Find indices of k smallest distances
nearest = sorted(range(len(distances)), key=lambda i: distances[i])[:k]
# 3. Majority vote among k neighbors
votes = [y_train[i] for i in nearest]
return max(set(votes), key=votes.count)Euclidean Distance: The Default Metric
The most common distance measure in KNN is Euclidean distance, which is the straight-line distance between two points in feature space. For two points A=(a1, a2) and B=(b1, b2), the Euclidean distance is sqrt((a1-b1)^2 + (a2-b2)^2). In higher dimensions, the same formula extends across all features. Because Euclidean distance treats all dimensions equally, features must be on the same scale — otherwise high-magnitude features dominate the distance calculation and KNN performs poorly.
import numpy as np
def euclidean_distance(a, b):
return np.sqrt(np.sum((a - b) ** 2))
point_a = np.array([1.0, 2.0])
point_b = np.array([4.0, 6.0])
dist = euclidean_distance(point_a, point_b)
print('Euclidean distance:', dist) # 5.0
# Verify with numpy
print('Using numpy:', np.linalg.norm(point_a - point_b))All lessons in this course
- How KNN Works: Distance, Neighbors, and Votes
- Choosing k: The Elbow Method and Validation Curves
- Distance Metrics: Euclidean, Manhattan, and Minkowski
- KNN for Regression and Its Scalability Limits