Machine Learning Academy · Lesson

Fine-Tuning: Unfreezing and Low Learning Rates

Learners will unfreeze earlier layers after initial head training, apply a lower learning rate to avoid destroying pre-trained features, and observe accuracy gains.

Why Fine-Tune After Feature Extraction?

Feature extraction adapts only the classification head, leaving the backbone frozen. Fine-tuning goes further by also updating some or all backbone layers, allowing the model to adapt its representations to your specific data distribution.

Fine-tuning is most beneficial when your domain differs from ImageNet — for example, satellite images, medical scans, or industrial defect photos. The ImageNet features partially transfer, but adapting them yields measurably higher accuracy. The risk is catastrophic forgetting: if you fine-tune with a high learning rate, the new data overwrites the carefully pre-learned features, causing performance to collapse.

The Two-Phase Fine-Tuning Strategy

The standard fine-tuning recipe has two phases. Phase 1: Freeze the backbone completely and train only the new classification head for several epochs until it converges. This ensures the head starts from a reasonable state before backbone gradients mix in.

Phase 2: Unfreeze all or some backbone layers and continue training with a very low learning rate (typically 10–100× smaller than phase 1). This gently nudges the pre-learned features toward your domain without destroying them. Skipping phase 1 and fine-tuning from the start with a high LR is the most common mistake that causes poor results.

import torchvision.models as models
import torch.nn as nn
import torch.optim as optim

model = models.resnet50(weights=models.ResNet50_Weights.IMAGENET1K_V1)

# --- Phase 1: Freeze backbone, train head ---
for param in model.parameters():
    param.requires_grad = False
model.fc = nn.Linear(model.fc.in_features, num_classes)
optimizer = optim.Adam(model.fc.parameters(), lr=1e-3)
# Train for 5-10 epochs...

# --- Phase 2: Unfreeze and fine-tune with low LR ---
for param in model.parameters():
    param.requires_grad = True
optimizer = optim.Adam(model.parameters(), lr=1e-5)  # 100x lower

All lessons in this course

Pre-trained Models in torchvision: ResNet, EfficientNet, and ViT
Feature Extraction: Freezing the Backbone
Fine-Tuning: Unfreezing and Low Learning Rates
Domain Adaptation: Medical Imaging with Scarce Labels

← Back to Machine Learning Academy