0PricingLogin
Deep Learning Academy · Lesson

Gradient Accumulation for Big Batches

Simulate large batches on small GPUs.

The Big-Batch Problem

Large batches often train more smoothly, but they also need lots of GPU memory. A small card simply cannot hold a giant batch at once.

The Core Trick

Gradient accumulation splits one big batch into small chunks. You add up their gradients and update once, as if the whole batch ran together.

All lessons in this course

  1. Mixed Precision with autocast & GradScaler
  2. Gradient Accumulation for Big Batches
  3. Profile the Bottleneck
  4. Cut GPU Memory Usage
← Back to Deep Learning Academy