Discriminative Layer-Wise Rates
Train deep and shallow layers differently.
One Rate for All?
So far every layer shared a single learning rate. But early and late layers learn very different things. Maybe they deserve different rates. 🎚️
Early Layers Are General
The early layers of a pretrained net detect edges, colors, and textures. These features transfer to almost any task, so they barely need changing.
All lessons in this course
- Freeze the Backbone, Train the Head
- Fine-Tune with a Lower Learning Rate
- Discriminative Layer-Wise Rates
- Fine-Tune a Hugging Face Model