Deep Learning Academy · Lesson

Weight Decay vs L2 Regularization

The subtle difference that matters.

Keep Weights Small

Big weights often mean an overfit model. Both weight decay and L2 regularization push weights toward zero so the network stays simpler.

L2 regularization adds a penalty term, the sum of squared weights, straight into the loss. Minimizing loss then also means shrinking the weights.

loss = data_loss + lam * (w ** 2).sum()