0PricingLogin
CUDA Academy · Lesson

The Load-Sync-Compute Pattern

Staging tiles before computing on them.

Three Simple Phases

Tiling follows one rhythm in every kernel: load a tile into shared memory, sync, then compute from the fast copy.

Phase One: Load

In the load phase, each thread reads one element from global memory and stores it into a shared-memory tile its whole block can see.

tile[threadIdx.x] = in[globalIndex];

All lessons in this course

  1. The Data Reuse Problem
  2. The Load-Sync-Compute Pattern
  3. Stencil and Sliding Windows
  4. Handling Edge Tiles
← Back to CUDA Academy