Optimizing the Inner Product
Vectorize and unroll the dot product.
The Dot Product Core
The inner product multiplies two sequences element by element and adds the results. Speeding it up speeds your whole matmul.
var acc: Float32 = 0.0
for k in range(K):
acc += a[k] * b[k]One Lane at a Time Is Slow
A plain loop handles a single pair per step. Modern CPUs can do many at once, so scalar code leaves lanes idle.
All lessons in this course
- Modeling a Tensor in Mojo
- Building a Matmul Step by Step
- Optimizing the Inner Product
- Verifying Numeric Correctness