The Naive Matmul Kernel
A 2D-indexed baseline and its limits.
Matrix Multiply, GPU Style
Matrix multiplication is the heart of graphics and AI. Today you build a naive GPU version first, then learn why it leaves speed on the table.
The Math in One Line
Each output cell C[row][col] is a dot product: multiply a full row of A by a full column of B and sum the results. 🧮
C[row][col] = sum over k of A[row][k] * B[k][col]All lessons in this course
- The Naive Matmul Kernel
- Tiling the Inner Product
- Looping Over Tile Phases
- Measuring the Speedup