The Life of a CUDA Program
Allocate, copy, launch, copy back, free.
A Repeating Rhythm
Almost every CUDA program follows the same five-beat dance. Learn the rhythm once and you can read any GPU code. 🕺
Step 1: Allocate
First reserve device memory for your inputs and outputs with cudaMalloc, since the GPU cannot use host buffers directly.
cudaMalloc(&d_a, bytes);
cudaMalloc(&d_b, bytes);All lessons in this course
- The __global__ Function Qualifier
- __device__ and __host__ Functions
- Separate Address Spaces
- The Life of a CUDA Program