SGD with Momentum
Smoothing updates to roll past noise.
Plain SGD Forgets
Vanilla SGD steps using only the current gradient. Every update starts from scratch, so a noisy slope makes it wobble and crawl toward the minimum.
Borrow Some Inertia
Momentum gives SGD memory. It keeps a running average of past gradients and rolls in that direction, like a ball gathering speed downhill.