Deep Learning Academy · Lesson

Vanishing & Exploding Gradients

What breaks deep nets and early fixes.

Gradients Are a Product

Through many layers, backprop multiplies many local derivatives together. The size of that long product decides whether learning works or breaks.

Multiplying Small Numbers

If each link's derivative is below one, the product shrinks fast. After many layers the gradient becomes tiny, almost zero by the time it reaches early layers.

All lessons in this course

The Chain Rule, Layer by Layer
Forward Caches, Backward Reuses
Backprop a Tiny Net by Hand
Vanishing & Exploding Gradients

← Back to Deep Learning Academy