0PricingLogin
CUDA Academy · Lesson

Loop Unrolling with #pragma unroll

Cutting loop overhead and exposing ILP.

Loops Have Hidden Costs

Every loop iteration spends work on the counter, the comparison, and the branch. This bookkeeping is called loop overhead, and it adds up in hot inner loops.

What Unrolling Does

Loop unrolling copies the body several times per iteration so the loop runs fewer times. You do the same work with less counting and branching.

for (int i = 0; i < n; i += 2) {
    out[i]     = in[i]     * 2;
    out[i + 1] = in[i + 1] * 2;
}

All lessons in this course

  1. Instruction-Level Parallelism
  2. Loop Unrolling with #pragma unroll
  3. Vectorized Loads with float4
  4. Register Pressure and Spills
← Back to CUDA Academy