CUDA Academy · Lesson

Peer-to-Peer Memory Access

Direct GPU-to-GPU copies over NVLink.

The Slow Detour

Moving data from one GPU to another usually bounces through the CPU's memory first. That round trip is slow and wastes the host's bandwidth.

Modern GPUs can skip the CPU entirely. Peer-to-peer access lets one GPU read and write another's memory over a direct link.