0Pricing
Deep Learning Academy · Lesson

Sync Batch Norm & Sharded State

Keep stats and weights consistent.

The Small-Batch Problem

Batch norm computes stats from each GPU's local batch. Split across many GPUs, each per-device batch shrinks and those stats get noisy.

Enter SyncBatchNorm

SyncBatchNorm fixes this by computing mean and variance across all GPUs together, as if it saw the full global batch.

All lessons in this course

  1. Data vs Model Parallelism
  2. DistributedDataParallel Basics
  3. Sync Batch Norm & Sharded State
  4. Launch Jobs with torchrun
← Back to Deep Learning Academy