0PricingLogin
CUDA Academy · Lesson

Rounding Up the Block Count

(n + threads - 1) / threads for full coverage.

How Many Blocks?

You fix the threads per block, then must decide how many blocks to launch so that every array element gets a thread.

Plain Division Loses Data

Integer division rounds down. With 1000 items and 256 threads, n / threads is just 3 blocks, covering only 768 elements and dropping the rest.

All lessons in this course

  1. The Classic Index Formula
  2. Guarding Against Out-of-Range
  3. Rounding Up the Block Count
  4. Grid-Stride Loops
← Back to CUDA Academy