Deep Learning Academy · Lesson

Gradient Accumulation for Big Batches

Simulate large batches on small GPUs.

The Big-Batch Problem

Large batches often train more smoothly, but they also need lots of GPU memory. A small card simply cannot hold a giant batch at once.

Gradient accumulation splits one big batch into small chunks. You add up their gradients and update once, as if the whole batch ran together.