Learning Rate: Too Big, Too Small, Just Right
The single most important knob.
The Size of Your Step
The gradient tells you which way to go, but how far? The learning rate is the multiplier that sets how big each downhill step is.
w = w - lr * gradA Single Powerful Knob
The learning rate is often the single most important setting in training. Get it wrong and even a perfect model will fail to learn.
lr = 0.01All lessons in this course
- Loss as a Landscape to Descend
- Gradients Point Uphill — So Step the Other Way
- Learning Rate: Too Big, Too Small, Just Right
- Minimize a Function by Hand in Python