Machine Learning Academy · Lesson

Random Oversampling and SMOTE

Learners will apply RandomOverSampler and SMOTE from imbalanced-learn to upsample the minority class, then evaluate whether precision-recall improves.

Why Oversampling Helps Imbalanced Models

Many classifiers learn a biased decision boundary when training data is dominated by one class — they effectively ignore the minority class. Oversampling increases the representation of the minority class in the training data, forcing the model to pay more attention to it. Two popular approaches are random oversampling (duplicating existing minority samples) and SMOTE (synthesising new ones).

Random Oversampling: Duplicate Minority Samples

RandomOverSampler from the imbalanced-learn library randomly duplicates examples from the minority class until the desired class ratio is achieved. It is simple and effective, but the duplicate samples add no new information — the model may overfit to the repeated minority examples if not regularised carefully.

from imblearn.over_sampling import RandomOverSampler
from sklearn.datasets import make_classification
import numpy as np

X, y = make_classification(n_samples=1000, weights=[0.95, 0.05],
                            n_features=10, random_state=42)

print('Before resampling:', np.bincount(y))

ros = RandomOverSampler(random_state=42)
X_res, y_res = ros.fit_resample(X, y)

print('After resampling: ', np.bincount(y_res))

All lessons in this course

Detecting Imbalance: Class Distribution and Baseline Pitfalls
Random Oversampling and SMOTE
Random Undersampling and Cluster Centroids
Class Weights and Threshold Moving

← Back to Machine Learning Academy