0Pricing
AI Agents · Lesson

Root Cause Analysis for Agent Failures

Systematic failure taxonomy: model error, tool error, data error, logic error.

Failure Taxonomy for Agents

Agent failures fall into four categories:

  • Model error: LLM calls the wrong tool or generates bad output
  • Tool error: An external API fails or returns unexpected data
  • Data error: Bad input (malformed, missing fields, unexpected types)
  • Logic error: Correct steps but in the wrong sequence or with wrong assumptions

Model Errors: Wrong Tool Call

Model errors happen when the LLM selects the wrong tool, passes incorrect arguments, or generates malformed JSON. These are often caused by unclear tool descriptions or ambiguous prompts.

import openai
import json

client = openai.OpenAI(api_key='sk-...')

def detect_model_errors(response) -> list:
    errors = []
    message = response.choices[0].message
    
    if message.tool_calls:
        for tc in message.tool_calls:
            tool_name = tc.function.name
            try:
                args = json.loads(tc.function.arguments)
            except json.JSONDecodeError as e:
                errors.append({
                    'type': 'model_error',
                    'subtype': 'malformed_tool_args',
                    'tool': tool_name,
                    'raw_args': tc.function.arguments,
                    'parse_error': str(e)
                })
                continue
            
            # Validate required arguments
            expected_tools = {
                'search_web': ['query'],
                'send_email': ['to', 'subject', 'body'],
                'create_task': ['title']
            }
            required = expected_tools.get(tool_name, [])
            missing = [r for r in required if r not in args]
            if missing:
                errors.append({
                    'type': 'model_error',
                    'subtype': 'missing_required_args',
                    'tool': tool_name,
                    'missing': missing
                })
    return errors

print('Model error detection function defined')

All lessons in this course

  1. Trace Analysis with LangSmith and Langfuse
  2. Per-Step Token and Cost Profiling
  3. Identifying Slow and Expensive Steps
  4. Root Cause Analysis for Agent Failures
← Back to AI Agents