Machine Learning Academy · Lesson

Dropout Regularisation to Prevent Overfitting

Learners will add nn.Dropout with varying probabilities, compare training vs validation loss curves, and call model.eval() to disable dropout at inference.

What Is Dropout and Why Use It?

Dropout, introduced by Srivastava et al. in 2014, is a regularisation technique that randomly sets a fraction of neuron activations to zero during each training step. By forcing the network to operate with a random subset of neurons, it prevents any single neuron from becoming too specialised and forces the network to learn redundant representations. This reduces overfitting significantly in large fully connected networks.

import torch
import torch.nn as nn

dropout = nn.Dropout(p=0.5)   # 50% chance each neuron is zeroed

x = torch.ones(1, 8)

# Training mode: randomly zeros activations
dropout.train()
out = dropout(x)
print('Training output:', out)
# Some values are 0, survivors are scaled by 1/(1-p)

# Eval mode: dropout is disabled (identity function)
dropout.eval()
out_eval = dropout(x)
print('Eval output:', out_eval)   # all ones

The Inverted Dropout Trick

PyTorch implements inverted dropout: during training, surviving activations are scaled up by 1/(1-p) to compensate for the zeroed neurons. This means the expected sum of activations stays the same regardless of the dropout rate. The advantage is that at inference you disable dropout and use the network as-is — no need to scale outputs. This is why Dropout in eval mode is simply an identity function.

import torch
import torch.nn as nn

# With p=0.5, surviving neurons are scaled by 2.0
dropout = nn.Dropout(p=0.5)
dropout.train()

# Start with all-ones tensor
x = torch.ones(1, 10)
out = dropout(x)
print('Scaled values:', out)
# Values are either 0 or 2.0 (= 1 / (1 - 0.5))

# Expected value = (1-p) * (1/(1-p)) = 1.0 (same as input)
print('Expected value preserved:', out.mean().item())

All lessons in this course

Learning Rate: The Most Important Hyperparameter
Batch Normalisation: Stable and Faster Training
Dropout Regularisation to Prevent Overfitting
Weight Initialisation: Xavier and He Initialisation

← Back to Machine Learning Academy