Machine Learning Academy · Lesson

Feature Scaling: StandardScaler and MinMaxScaler

Learners will fit StandardScaler and MinMaxScaler to training data, transform test data separately, and understand why fitting on test data causes leakage.

Why Feature Scaling Matters

Many machine learning algorithms are sensitive to the scale of input features. If one feature spans 0–1000 and another spans 0–1, distance-based models like KNN and SVM will be dominated by the larger-scale feature. Gradient descent algorithms also converge much faster when features share a similar scale. Tree-based models like Random Forests and XGBoost are scale-invariant and do not require scaling, but almost every other algorithm benefits from it.

import numpy as np

# Without scaling: 'income' dominates 'age'
X = np.array([
    [25, 50000],
    [35, 80000],
    [45, 120000]
])

# Distance between row 0 and row 1 (Euclidean)
dist = np.sqrt((25-35)**2 + (50000-80000)**2)
print('Distance dominated by income:', dist)
# age contributes 100, income contributes 900,000,000 to squared distance

StandardScaler: Z-Score Normalisation

StandardScaler transforms each feature to have zero mean and unit variance by applying the formula: z = (x - mean) / std. This is called standardisation or z-score normalisation. The resulting values are centred around 0 with no fixed range. StandardScaler is the default choice for most algorithms including logistic regression, SVMs, and neural networks, especially when the feature distribution is approximately Gaussian.

from sklearn.preprocessing import StandardScaler
import numpy as np

X = np.array([[25, 50000], [35, 80000], [45, 120000]], dtype=float)

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

print('Scaled data:\n', X_scaled)
print('Mean per feature:', X_scaled.mean(axis=0))  # ~[0, 0]
print('Std per feature:', X_scaled.std(axis=0))    # ~[1, 1]

All lessons in this course

Handling Missing Values: Drop, Impute, and Flag
Feature Scaling: StandardScaler and MinMaxScaler
Encoding Categorical Variables: OrdinalEncoder and OneHotEncoder
Combining Steps with ColumnTransformer

← Back to Machine Learning Academy