Machine Learning Academy · Lesson

Training Loop: Loss, Optimizer, and Epochs

Learners will write the PyTorch training loop: zero_grad, forward, compute CrossEntropyLoss, backward, optimizer.step, and track loss and accuracy over epochs.

The Four Steps of Every Training Loop

The PyTorch training loop has four mandatory steps that repeat for every batch: (1) zero gradients, (2) forward pass, (3) backward pass, and (4) optimizer step. Skipping or reordering these steps produces wrong results silently — gradients accumulate, parameters update incorrectly, or memory leaks. Internalising this pattern is the most important habit for training neural networks with PyTorch.

import torch
import torch.nn as nn
import torch.optim as optim

model = nn.Linear(2, 1)
optimizer = optim.SGD(model.parameters(), lr=0.01)
criterion = nn.MSELoss()

X = torch.randn(20, 2)
y = torch.randn(20, 1)

for step in range(1):
    optimizer.zero_grad()           # (1) zero grads
    y_pred = model(X)               # (2) forward
    loss = criterion(y_pred, y)     # (2) loss
    loss.backward()                 # (3) backward
    optimizer.step()                # (4) update
    print('Loss:', loss.item())

Loss Functions: Choosing the Right Criterion

The loss function measures how wrong the model's predictions are. PyTorch's nn module provides many: nn.MSELoss for regression (mean squared error), nn.CrossEntropyLoss for multi-class classification (combines log-softmax and NLL), and nn.BCEWithLogitsLoss for binary classification (combines sigmoid and binary cross-entropy). Using the wrong loss for your task is a common beginner mistake that prevents learning.

import torch
import torch.nn as nn

# Regression
mse = nn.MSELoss()
y_pred = torch.tensor([2.5, 3.0])
y_true = torch.tensor([2.0, 3.5])
print('MSE:', mse(y_pred, y_true).item())

# Multi-class: logits (raw scores), not softmax
ce = nn.CrossEntropyLoss()
logits = torch.tensor([[2.0, 0.5, 1.0]])
labels = torch.tensor([0])
print('CE:', ce(logits, labels).item())

All lessons in this course

PyTorch Tensors: Creation, Operations, and GPU Transfer
Autograd: Automatic Differentiation for Backpropagation
Building a Feedforward Network with nn.Module
Training Loop: Loss, Optimizer, and Epochs

← Back to Machine Learning Academy