Mojo Academy · Lesson

Combining SIMD with Loops

Vectorize the kernel's core.

From Scalar to SIMD

To accelerate a kernel, swap the one-at-a-time loop for one that handles a whole pack of values per step with SIMD.

for i in range(n):
    out[i] = a[i] + b[i]

Choose how many elements fit in one vector. The width is the number of lanes each step processes together.

alias width = 4