The Chain Rule, Layer by Layer
Compose derivatives through the network.
A Net Is Nested Functions
A neural network is just functions wrapped inside functions: each layer feeds its output into the next. Backprop needs the derivative of that whole stack.
The Chain Rule in One Line
The chain rule says the derivative of nested functions multiplies the derivatives of each piece. That single fact is the engine behind all of backprop. ⛓️
dy/dx = dy/du * du/dxAll lessons in this course
- The Chain Rule, Layer by Layer
- Forward Caches, Backward Reuses
- Backprop a Tiny Net by Hand
- Vanishing & Exploding Gradients