0PricingLogin
Deep Learning Academy · Lesson

Weight Decay vs L2 Regularization

The subtle difference that matters.

Keep Weights Small

Big weights often mean an overfit model. Both weight decay and L2 regularization push weights toward zero so the network stays simpler.

L2 Adds to the Loss

L2 regularization adds a penalty term, the sum of squared weights, straight into the loss. Minimizing loss then also means shrinking the weights.

loss = data_loss + lam * (w ** 2).sum()

All lessons in this course

  1. SGD with Momentum
  2. Adam & AdamW Explained
  3. Weight Decay vs L2 Regularization
  4. Learning Rate Schedules & Warmup
← Back to Deep Learning Academy