Dropout Regularisation to Prevent Overfitting
Learners will add nn.Dropout with varying probabilities, compare training vs validation loss curves, and call model.eval() to disable dropout at inference.
What Is Dropout and Why Use It?
Dropout, introduced by Srivastava et al. in 2014, is a regularisation technique that randomly sets a fraction of neuron activations to zero during each training step. By forcing the network to operate with a random subset of neurons, it prevents any single neuron from becoming too specialised and forces the network to learn redundant representations. This reduces overfitting significantly in large fully connected networks.
import torch
import torch.nn as nn
dropout = nn.Dropout(p=0.5) # 50% chance each neuron is zeroed
x = torch.ones(1, 8)
# Training mode: randomly zeros activations
dropout.train()
out = dropout(x)
print('Training output:', out)
# Some values are 0, survivors are scaled by 1/(1-p)
# Eval mode: dropout is disabled (identity function)
dropout.eval()
out_eval = dropout(x)
print('Eval output:', out_eval) # all onesThe Inverted Dropout Trick
PyTorch implements inverted dropout: during training, surviving activations are scaled up by 1/(1-p) to compensate for the zeroed neurons. This means the expected sum of activations stays the same regardless of the dropout rate. The advantage is that at inference you disable dropout and use the network as-is — no need to scale outputs. This is why Dropout in eval mode is simply an identity function.
import torch
import torch.nn as nn
# With p=0.5, surviving neurons are scaled by 2.0
dropout = nn.Dropout(p=0.5)
dropout.train()
# Start with all-ones tensor
x = torch.ones(1, 10)
out = dropout(x)
print('Scaled values:', out)
# Values are either 0 or 2.0 (= 1 / (1 - 0.5))
# Expected value = (1-p) * (1/(1-p)) = 1.0 (same as input)
print('Expected value preserved:', out.mean().item())All lessons in this course
- Learning Rate: The Most Important Hyperparameter
- Batch Normalisation: Stable and Faster Training
- Dropout Regularisation to Prevent Overfitting
- Weight Initialisation: Xavier and He Initialisation