0Pricing
CUDA Academy · Lesson

Looping Over Tile Phases

Accumulating partial sums across tiles.

The Dot Product Is Split

A full row times a full column is too big for one tile. So you split that long sum into chunks of width TILE, one chunk per phase.

Counting the Phases

If the matrices are N wide and tiles are TILE wide, you need N / TILE phases to cover the whole inner dimension.

int numPhases = (N + TILE - 1) / TILE;

All lessons in this course

  1. The Naive Matmul Kernel
  2. Tiling the Inner Product
  3. Looping Over Tile Phases
  4. Measuring the Speedup
← Back to CUDA Academy