CUDA Academy · Lesson

The Naive Matmul Kernel

A 2D-indexed baseline and its limits.

Matrix Multiply, GPU Style

Matrix multiplication is the heart of graphics and AI. Today you build a naive GPU version first, then learn why it leaves speed on the table.

Each output cell C[row][col] is a dot product: multiply a full row of A by a full column of B and sum the results. 🧮

C[row][col] = sum over k of A[row][k] * B[k][col]