The Classic Index Formula
blockIdx.x * blockDim.x + threadIdx.x.
One Thread, One Element
The whole point of a CUDA kernel is that every thread handles one piece of data. To do that, each thread needs a unique global index.
Why Local IDs Are Not Enough
Inside a block, threadIdx.x only counts 0 up to blockDim minus one. Many blocks reuse those same small numbers, so it cannot be your final index.
All lessons in this course
- The Classic Index Formula
- Guarding Against Out-of-Range
- Rounding Up the Block Count
- Grid-Stride Loops