AI Agents · Lesson

Per-Step Token and Cost Profiling

Measuring token consumption per tool call and per reasoning step.

Why Profile Token Usage?

LLM API costs scale directly with token usage. A single agent run can make dozens of LLM calls. Without per-step profiling, you cannot know which step is expensive, where to cache, or how to reduce costs.

Reading Token Usage from OpenAI

Every OpenAI completion response includes a usage object with prompt_tokens, completion_tokens, and total_tokens. Always capture this.

import openai

client = openai.OpenAI(api_key='sk-...')

def call_llm_with_tracking(prompt: str, model: str = 'gpt-4o-mini') -> dict:
    response = client.chat.completions.create(
        model=model,
        messages=[{'role': 'user', 'content': prompt}]
    )
    
    usage = response.usage
    return {
        'content': response.choices[0].message.content,
        'prompt_tokens': usage.prompt_tokens,
        'completion_tokens': usage.completion_tokens,
        'total_tokens': usage.total_tokens,
        'model': model
    }

result = call_llm_with_tracking('What is the capital of France?')
print(f'Response: {result["content"]}')
print(f'Tokens - Prompt: {result["prompt_tokens"]}, Completion: {result["completion_tokens"]}, Total: {result["total_tokens"]}')

All lessons in this course

Trace Analysis with LangSmith and Langfuse
Per-Step Token and Cost Profiling
Identifying Slow and Expensive Steps
Root Cause Analysis for Agent Failures

← Back to AI Agents