Machine Learning Academy · Lesson

Pooling Layers: Spatial Downsampling and Invariance

Learners will add MaxPool2d after convolution layers, compute output dimensions, and understand how pooling gives position tolerance and reduces computation.

Why Pooling Layers Exist

Pooling layers reduce the spatial dimensions of feature maps, decreasing computation and memory while making representations more compact. They also introduce a degree of spatial invariance — small translations of a feature in the input produce the same pooled output. Without pooling (or stride-2 convolutions), the spatial dimensions would remain constant through all layers, making the network computationally prohibitive for large images.

import torch
import torch.nn as nn

# Without pooling: feature map grows in channels but not reduced
conv1 = nn.Conv2d(3, 32, 3, padding=1)   # (B, 32, H, W)
conv2 = nn.Conv2d(32, 64, 3, padding=1)  # (B, 64, H, W)

# With pooling: spatial dims are halved each time
pool = nn.MaxPool2d(kernel_size=2, stride=2)

x = torch.randn(4, 3, 32, 32)
out = pool(torch.relu(conv1(x)))
print(out.shape)   # (4, 32, 16, 16) -- halved!

Max Pooling: Taking the Maximum

MaxPool2d divides the input feature map into non-overlapping windows and takes the maximum value in each window. The maximum corresponds to the strongest activation of the filter at any position in that window — it captures whether the feature was present anywhere in the region, regardless of its exact position. A 2x2 max pool with stride 2 reduces height and width by half, cutting the total spatial size by 4x.

import torch
import torch.nn as nn

pool = nn.MaxPool2d(kernel_size=2, stride=2)

# Simple 4x4 feature map
x = torch.tensor([[[[ 1., 3., 2., 4.],
                     [ 5., 6., 7., 8.],
                     [ 9., 2., 1., 3.],
                     [ 4., 5., 6., 7.]]]])

out = pool(x)
print(out.shape)   # (1, 1, 2, 2)
print(out)
# Max of top-left 2x2: max(1,3,5,6)=6
# Max of top-right 2x2: max(2,4,7,8)=8
# Max of bottom-left 2x2: max(9,2,4,5)=9
# Max of bottom-right 2x2: max(1,3,6,7)=7

All lessons in this course

Convolution and Filters: Detecting Edges and Patterns
Pooling Layers: Spatial Downsampling and Invariance
Building and Training a CNN on CIFAR-10
Data Augmentation: Transforms for Robustness

← Back to Machine Learning Academy