Deep Learning Academy · Lesson

SGD with Momentum

Smoothing updates to roll past noise.

Plain SGD Forgets

Vanilla SGD steps using only the current gradient. Every update starts from scratch, so a noisy slope makes it wobble and crawl toward the minimum.

Momentum gives SGD memory. It keeps a running average of past gradients and rolls in that direction, like a ball gathering speed downhill.