The PCIe Transfer Bottleneck
Why copies are often the slow part.
The Bridge Between Worlds
The CPU and GPU live on separate boards connected by the PCIe bus. Every cudaMemcpy must squeeze through this narrow bridge. 🌉
A Speed Mismatch
GPU memory bandwidth can be terabytes per second, but PCIe delivers only tens of gigabytes. The bus is the slow link.
All lessons in this course
- Host-to-Device Transfers
- Device-to-Host Transfers
- The Copy Direction Enum
- The PCIe Transfer Bottleneck