0PricingLogin
Deep Learning Academy · Lesson

SGD with Momentum

Smoothing updates to roll past noise.

Plain SGD Forgets

Vanilla SGD steps using only the current gradient. Every update starts from scratch, so a noisy slope makes it wobble and crawl toward the minimum.

Borrow Some Inertia

Momentum gives SGD memory. It keeps a running average of past gradients and rolls in that direction, like a ball gathering speed downhill.

All lessons in this course

  1. SGD with Momentum
  2. Adam & AdamW Explained
  3. Weight Decay vs L2 Regularization
  4. Learning Rate Schedules & Warmup
← Back to Deep Learning Academy