Deep Learning Academy · Lesson

Sync Batch Norm & Sharded State

Keep stats and weights consistent.

The Small-Batch Problem

Batch norm computes stats from each GPU's local batch. Split across many GPUs, each per-device batch shrinks and those stats get noisy.

SyncBatchNorm fixes this by computing mean and variance across all GPUs together, as if it saw the full global batch.