The Double-Buffering Pipeline
Chunking data to keep the GPU fed.
The Idle GPU Problem
Copy a huge array, then run a kernel, then copy back. During each copy the GPU sits idle, and during compute the transfer engines sit idle.
Split the Work
The fix starts with breaking one giant array into smaller chunks. Each chunk can be copied and processed on its own, opening the door to overlap.
All lessons in this course
- Why Pageable Memory Is Slow
- Pinned Memory with cudaMallocHost
- cudaMemcpyAsync in a Stream
- The Double-Buffering Pipeline