Identifying Slow and Expensive Steps
Waterfall profiling: where is the agent spending its time and budget?
Performance Profiling for Agents
Agent performance issues fall into two categories: slow steps (high latency) and expensive steps (high token cost). Both hurt user experience and operational costs. The first step is measurement.
Timing Each Step
Use time.perf_counter() for high-precision timing. It measures wall-clock time including I/O waits — exactly what matters for agent step latency.
import time
from contextlib import contextmanager
@contextmanager
def timer(step_name: str, timings: dict):
start = time.perf_counter()
try:
yield
finally:
end = time.perf_counter()
duration_ms = (end - start) * 1000
timings[step_name] = duration_ms
print(f'{step_name}: {duration_ms:.1f}ms')
# Usage
timings = {}
with timer('entity_extraction', timings):
time.sleep(0.05) # Simulate work
with timer('vector_search', timings):
time.sleep(0.12) # Simulate work
with timer('llm_call', timings):
time.sleep(0.80) # Simulate LLM latency
print('\nTimings:', timings)
print('Slowest step:', max(timings, key=timings.get))All lessons in this course
- Trace Analysis with LangSmith and Langfuse
- Per-Step Token and Cost Profiling
- Identifying Slow and Expensive Steps
- Root Cause Analysis for Agent Failures