Batch Norm & Layer Norm
Stabilize activations to train deeper.
Why Normalize Activations
As signals flow through layers, their scale drifts and training slows. Normalization keeps activations in a stable, well-behaved range. ⚖️
Batch Norm in One Line
Batch normalization rescales each feature using the mean and variance computed across the current mini-batch.
All lessons in this course
- Read the Train/Val Gap
- Dropout: Randomly Drop Neurons
- Batch Norm & Layer Norm
- Data Augmentation as Free Data