Rounding Up the Block Count
(n + threads - 1) / threads for full coverage.
How Many Blocks?
You fix the threads per block, then must decide how many blocks to launch so that every array element gets a thread.
Plain Division Loses Data
Integer division rounds down. With 1000 items and 256 threads, n / threads is just 3 blocks, covering only 768 elements and dropping the rest.
All lessons in this course
- The Classic Index Formula
- Guarding Against Out-of-Range
- Rounding Up the Block Count
- Grid-Stride Loops