Machine Learning Academy · Lesson

KNN for Regression and Its Scalability Limits

Learners will apply KNeighborsRegressor to a continuous target, then profile prediction time on large datasets to appreciate KNN's O(n) inference cost.

KNN for Regression: Averaging Neighbors

KNN is not limited to classification — it can also predict continuous values. In KNN regression, the prediction for a new point is the average of the target values of its k nearest neighbors. For example, to predict the price of a house, KNN finds the k most similar houses in the training set and averages their prices. This produces a non-parametric, local regression model that can capture complex patterns without assuming any functional form between features and the target.

import numpy as np

# Training data: house sizes (sqm) -> prices (thousands)
X_train = np.array([[50], [70], [90], [110], [130]])
y_train = np.array([150, 200, 260, 310, 380])

# Query: predict price for 80 sqm house
x_new = np.array([[80]])

# k=3: find 3 nearest neighbors
dists = np.abs(X_train - x_new).flatten()
nearest_idx = np.argsort(dists)[:3]
neighbor_prices = y_train[nearest_idx]

prediction = neighbor_prices.mean()
print('Neighbor prices:', neighbor_prices)
print('KNN regression prediction:', prediction)

KNeighborsRegressor in scikit-learn

Scikit-learn's KNeighborsRegressor implements KNN for continuous targets with the same API as the classifier. It supports the same parameters: n_neighbors, metric, weights, and algorithm. Distance-weighted regression (weights='distance') is often beneficial: closer neighbors contribute more to the predicted value than farther ones, which is especially useful at the edges of the training data distribution where a distant neighbor might introduce significant bias.

from sklearn.neighbors import KNeighborsRegressor
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import numpy as np

X, y = fetch_california_housing(return_X_y=True)
X_tr, X_te, y_tr, y_te = train_test_split(X, y, test_size=0.2, random_state=42)

pipe = Pipeline([
    ('sc', StandardScaler()),
    ('knn', KNeighborsRegressor(n_neighbors=10, weights='distance'))
])
pipe.fit(X_tr, y_tr)
rmse = mean_squared_error(y_te, pipe.predict(X_te), squared=False)
print('RMSE:', rmse.round(3))

All lessons in this course

How KNN Works: Distance, Neighbors, and Votes
Choosing k: The Elbow Method and Validation Curves
Distance Metrics: Euclidean, Manhattan, and Minkowski
KNN for Regression and Its Scalability Limits

← Back to Machine Learning Academy