The Data Reuse Problem
Why naive kernels re-read global memory.
The Hidden Cost
A kernel can be correct yet slow because it keeps fetching the same data from slow global memory over and over. 🐢
Memory Is the Bottleneck
On the GPU, arithmetic is cheap but reaching global memory is expensive. Many kernels wait on memory far more than they compute.
All lessons in this course
- The Data Reuse Problem
- The Load-Sync-Compute Pattern
- Stencil and Sliding Windows
- Handling Edge Tiles