0Pricing
Mojo Academy · Lesson

Combining SIMD with Loops

Vectorize the kernel's core.

From Scalar to SIMD

To accelerate a kernel, swap the one-at-a-time loop for one that handles a whole pack of values per step with SIMD.

for i in range(n):
    out[i] = a[i] + b[i]

Pick a Width

Choose how many elements fit in one vector. The width is the number of lanes each step processes together.

alias width = 4

All lessons in this course

  1. Anatomy of a Compute Kernel
  2. Combining SIMD with Loops
  3. Reducing Memory Traffic
  4. Tiling for Cache Locality
← Back to Mojo Academy