Replaying Graphs to Cut Overhead
Amortizing launch cost across iterations.
The Whole Point: Replay
Graphs exist to be replayed. One launch call submits the entire recorded sequence to the GPU at once.
Launch the Exec Object
You replay with cudaGraphLaunch, passing the instantiated exec and a stream. That is the whole submission.
cudaGraphLaunch(exec, stream);All lessons in this course
- Launching Kernels from a Kernel
- When Dynamic Parallelism Pays
- Capturing Work into a Graph
- Replaying Graphs to Cut Overhead