Per-Step Token and Cost Profiling
Measuring token consumption per tool call and per reasoning step.
Why Profile Token Usage?
LLM API costs scale directly with token usage. A single agent run can make dozens of LLM calls. Without per-step profiling, you cannot know which step is expensive, where to cache, or how to reduce costs.
Reading Token Usage from OpenAI
Every OpenAI completion response includes a usage object with prompt_tokens, completion_tokens, and total_tokens. Always capture this.
import openai
client = openai.OpenAI(api_key='sk-...')
def call_llm_with_tracking(prompt: str, model: str = 'gpt-4o-mini') -> dict:
response = client.chat.completions.create(
model=model,
messages=[{'role': 'user', 'content': prompt}]
)
usage = response.usage
return {
'content': response.choices[0].message.content,
'prompt_tokens': usage.prompt_tokens,
'completion_tokens': usage.completion_tokens,
'total_tokens': usage.total_tokens,
'model': model
}
result = call_llm_with_tracking('What is the capital of France?')
print(f'Response: {result["content"]}')
print(f'Tokens - Prompt: {result["prompt_tokens"]}, Completion: {result["completion_tokens"]}, Total: {result["total_tokens"]}')All lessons in this course
- Trace Analysis with LangSmith and Langfuse
- Per-Step Token and Cost Profiling
- Identifying Slow and Expensive Steps
- Root Cause Analysis for Agent Failures