Multi-GPU with NCCL
Collective communication for scaling.
Hand-Rolled Gets Hard
Wiring peer copies by hand across many GPUs becomes messy fast. For real scaling you want a library built for collective communication.
Meet NCCL
NVIDIA's answer is NCCL, the collective communications library. It moves data among GPUs using the fastest links it can find, automatically.