AI Agents · Lesson

Root Cause Analysis for Agent Failures

Systematic failure taxonomy: model error, tool error, data error, logic error.

Failure Taxonomy for Agents

Agent failures fall into four categories:

Model error: LLM calls the wrong tool or generates bad output
Tool error: An external API fails or returns unexpected data
Data error: Bad input (malformed, missing fields, unexpected types)
Logic error: Correct steps but in the wrong sequence or with wrong assumptions

Model Errors: Wrong Tool Call

Model errors happen when the LLM selects the wrong tool, passes incorrect arguments, or generates malformed JSON. These are often caused by unclear tool descriptions or ambiguous prompts.

import openai
import json

client = openai.OpenAI(api_key='sk-...')

def detect_model_errors(response) -> list:
    errors = []
    message = response.choices[0].message
    
    if message.tool_calls:
        for tc in message.tool_calls:
            tool_name = tc.function.name
            try:
                args = json.loads(tc.function.arguments)
            except json.JSONDecodeError as e:
                errors.append({
                    'type': 'model_error',
                    'subtype': 'malformed_tool_args',
                    'tool': tool_name,
                    'raw_args': tc.function.arguments,
                    'parse_error': str(e)
                })
                continue
            
            # Validate required arguments
            expected_tools = {
                'search_web': ['query'],
                'send_email': ['to', 'subject', 'body'],
                'create_task': ['title']
            }
            required = expected_tools.get(tool_name, [])
            missing = [r for r in required if r not in args]
            if missing:
                errors.append({
                    'type': 'model_error',
                    'subtype': 'missing_required_args',
                    'tool': tool_name,
                    'missing': missing
                })
    return errors

print('Model error detection function defined')

All lessons in this course

Trace Analysis with LangSmith and Langfuse
Per-Step Token and Cost Profiling
Identifying Slow and Expensive Steps
Root Cause Analysis for Agent Failures

← Back to AI Agents