Fine-Tuning: Unfreezing and Low Learning Rates
Learners will unfreeze earlier layers after initial head training, apply a lower learning rate to avoid destroying pre-trained features, and observe accuracy gains.
Why Fine-Tune After Feature Extraction?
Feature extraction adapts only the classification head, leaving the backbone frozen. Fine-tuning goes further by also updating some or all backbone layers, allowing the model to adapt its representations to your specific data distribution.
Fine-tuning is most beneficial when your domain differs from ImageNet — for example, satellite images, medical scans, or industrial defect photos. The ImageNet features partially transfer, but adapting them yields measurably higher accuracy. The risk is catastrophic forgetting: if you fine-tune with a high learning rate, the new data overwrites the carefully pre-learned features, causing performance to collapse.
The Two-Phase Fine-Tuning Strategy
The standard fine-tuning recipe has two phases. Phase 1: Freeze the backbone completely and train only the new classification head for several epochs until it converges. This ensures the head starts from a reasonable state before backbone gradients mix in.
Phase 2: Unfreeze all or some backbone layers and continue training with a very low learning rate (typically 10–100× smaller than phase 1). This gently nudges the pre-learned features toward your domain without destroying them. Skipping phase 1 and fine-tuning from the start with a high LR is the most common mistake that causes poor results.
import torchvision.models as models
import torch.nn as nn
import torch.optim as optim
model = models.resnet50(weights=models.ResNet50_Weights.IMAGENET1K_V1)
# --- Phase 1: Freeze backbone, train head ---
for param in model.parameters():
param.requires_grad = False
model.fc = nn.Linear(model.fc.in_features, num_classes)
optimizer = optim.Adam(model.fc.parameters(), lr=1e-3)
# Train for 5-10 epochs...
# --- Phase 2: Unfreeze and fine-tune with low LR ---
for param in model.parameters():
param.requires_grad = True
optimizer = optim.Adam(model.parameters(), lr=1e-5) # 100x lowerAll lessons in this course
- Pre-trained Models in torchvision: ResNet, EfficientNet, and ViT
- Feature Extraction: Freezing the Backbone
- Fine-Tuning: Unfreezing and Low Learning Rates
- Domain Adaptation: Medical Imaging with Scarce Labels