DistributedDataParallel (DDP)
Process groups, dist.init_process_group, DistributedSampler, gradient synchronization.
What is DDP
DistributedDataParallel (DDP) is PyTorch high-performance approach to data-parallel training. It launches one process per GPU, each holding a full model replica, and synchronizes gradients efficiently. DDP scales near-linearly across GPUs and machines.
The Process Group
DDP coordinates processes through a process group. Each process gets a unique rank and knows the total world size. They communicate over a backend; on NVIDIA GPUs that backend is nccl.
All lessons in this course
- Multi-GPU Training with DataParallel
- DistributedDataParallel (DDP)
- Mixed Precision Training with AMP
- Efficient Training with Hugging Face Accelerate