0Pricing
Deep Learning Academy · Lesson

Vanishing & Exploding Gradients

What breaks deep nets and early fixes.

Gradients Are a Product

Through many layers, backprop multiplies many local derivatives together. The size of that long product decides whether learning works or breaks.

Multiplying Small Numbers

If each link's derivative is below one, the product shrinks fast. After many layers the gradient becomes tiny, almost zero by the time it reaches early layers.

All lessons in this course

  1. The Chain Rule, Layer by Layer
  2. Forward Caches, Backward Reuses
  3. Backprop a Tiny Net by Hand
  4. Vanishing & Exploding Gradients
← Back to Deep Learning Academy