The Load-Sync-Compute Pattern
Staging tiles before computing on them.
Three Simple Phases
Tiling follows one rhythm in every kernel: load a tile into shared memory, sync, then compute from the fast copy.
Phase One: Load
In the load phase, each thread reads one element from global memory and stores it into a shared-memory tile its whole block can see.
tile[threadIdx.x] = in[globalIndex];All lessons in this course
- The Data Reuse Problem
- The Load-Sync-Compute Pattern
- Stencil and Sliding Windows
- Handling Edge Tiles