Looping Over Tile Phases
Accumulating partial sums across tiles.
The Dot Product Is Split
A full row times a full column is too big for one tile. So you split that long sum into chunks of width TILE, one chunk per phase.
Counting the Phases
If the matrices are N wide and tiles are TILE wide, you need N / TILE phases to cover the whole inner dimension.
int numPhases = (N + TILE - 1) / TILE;All lessons in this course
- The Naive Matmul Kernel
- Tiling the Inner Product
- Looping Over Tile Phases
- Measuring the Speedup