Machine Learning Academy · Lesson

How KNN Works: Distance, Neighbors, and Votes

Learners will visualise a 2D dataset, compute Euclidean distances, identify the k nearest neighbours, and produce a classification by majority vote.

The Core Intuition Behind KNN

K-Nearest Neighbors (KNN) is one of the most intuitive machine learning algorithms: predict the label of a new point by looking at the k closest labelled points and taking a majority vote. There is no explicit training phase — the algorithm simply memorises the training data and performs all computation at prediction time. This makes KNN a lazy learner. It works well when similar inputs have similar outputs, which is a reasonable assumption in many real-world problems like recommending products or diagnosing diseases.

# Conceptual pseudocode
def knn_predict(X_train, y_train, x_new, k=3):
    # 1. Compute distance from x_new to every training point
    distances = [euclidean(x_new, x_i) for x_i in X_train]
    # 2. Find indices of k smallest distances
    nearest = sorted(range(len(distances)), key=lambda i: distances[i])[:k]
    # 3. Majority vote among k neighbors
    votes = [y_train[i] for i in nearest]
    return max(set(votes), key=votes.count)

Euclidean Distance: The Default Metric

The most common distance measure in KNN is Euclidean distance, which is the straight-line distance between two points in feature space. For two points A=(a1, a2) and B=(b1, b2), the Euclidean distance is sqrt((a1-b1)^2 + (a2-b2)^2). In higher dimensions, the same formula extends across all features. Because Euclidean distance treats all dimensions equally, features must be on the same scale — otherwise high-magnitude features dominate the distance calculation and KNN performs poorly.

import numpy as np

def euclidean_distance(a, b):
    return np.sqrt(np.sum((a - b) ** 2))

point_a = np.array([1.0, 2.0])
point_b = np.array([4.0, 6.0])

dist = euclidean_distance(point_a, point_b)
print('Euclidean distance:', dist)  # 5.0

# Verify with numpy
print('Using numpy:', np.linalg.norm(point_a - point_b))

All lessons in this course

How KNN Works: Distance, Neighbors, and Votes
Choosing k: The Elbow Method and Validation Curves
Distance Metrics: Euclidean, Manhattan, and Minkowski
KNN for Regression and Its Scalability Limits

← Back to Machine Learning Academy