Latency and Cost per Step
Measure token usage and wall-clock time per node so you can find the slow and expensive steps.
Why Per-Step Metrics?
Total latency and cost tell you the system is slow or expensive — but not WHICH step is slow or expensive. Per-step measurement is required to fix it.
Recording Latency
Add it to every span:
start = time.time()
# ... do work ...
span.duration_ms = (time.time() - start) * 1000