Machine Learning Academy · Lesson

Pre-trained Models in torchvision: ResNet, EfficientNet, and ViT

Learners will load a ResNet-50 pre-trained on ImageNet, inspect its architecture, and make an inference on a new image to verify the pre-learned representations.

Why Use Pre-trained Models?

Training a deep neural network on ImageNet from scratch requires millions of labelled images and weeks of GPU compute. Pre-trained models have already learned general visual features — edges, textures, shapes, and high-level object parts — from this enormous dataset.

By reusing these weights, you benefit from the learning done on 1.2 million images without paying the training cost. This is the core idea of transfer learning: features learned on one large task transfer well to related smaller tasks. torchvision.models provides dozens of pre-trained architectures ready to download and use.

import torchvision.models as models

# List some available pre-trained models
print(dir(models))  # Shows resnet50, efficientnet_b0, vit_b_16, etc.

# Loading weights pre-trained on ImageNet-1k
resnet = models.resnet50(weights=models.ResNet50_Weights.IMAGENET1K_V1)
print('ResNet-50 loaded, parameters:', sum(p.numel() for p in resnet.parameters()))

ResNet-50: Architecture Overview

ResNet-50 (Residual Network with 50 layers) introduced skip connections that add the input of a block directly to its output: output = F(x) + x. This allows gradients to flow directly through the addition, enabling training of very deep networks without vanishing gradients.

ResNet-50 has approximately 25 million parameters and consists of: one initial 7×7 convolutional layer, max pooling, four residual blocks (layer1–layer4), and a global average pooling layer followed by a 1000-class fully connected head for ImageNet classification. The final fc layer is what we replace for custom tasks.

import torchvision.models as models
import torch

resnet = models.resnet50(weights=models.ResNet50_Weights.IMAGENET1K_V1)
print(resnet)  # Prints the full architecture

# Key layers
print('Final FC layer:', resnet.fc)  # Linear(2048, 1000)
print('Layer4 output channels:', 2048)  # Feature dimension before FC

All lessons in this course

Pre-trained Models in torchvision: ResNet, EfficientNet, and ViT
Feature Extraction: Freezing the Backbone
Fine-Tuning: Unfreezing and Low Learning Rates
Domain Adaptation: Medical Imaging with Scarce Labels

← Back to Machine Learning Academy