Overlapping Copy and Compute
Hiding transfers behind kernels.
The Big Idea
Real speedups come from doing two things at once: copying one chunk of data while the GPU computes on another. 🔀
Separate Hardware Engines
A GPU has distinct copy engines and compute units. Overlap works because these can run truly in parallel, not just take turns.
All lessons in this course
- The Default Stream Trap
- Creating and Using Streams
- Events for Timing and Sync
- Overlapping Copy and Compute