Learning Rate Schedules & Warmup
Step, cosine, and warmup strategies.
A Rate That Changes
One fixed learning rate is rarely ideal for the whole run. A schedule changes it over time, usually large early and small near the end.
Why Decay Helps
Big early steps cover ground fast. Shrinking the rate later lets the model settle precisely into a minimum instead of bouncing around it.
All lessons in this course
- SGD with Momentum
- Adam & AdamW Explained
- Weight Decay vs L2 Regularization
- Learning Rate Schedules & Warmup