0PricingLogin
AI Agents · Lesson

Why Testing Agents Is Different

Non-determinism, LLM cost, and why standard unit tests fall short.

Testing Software vs. Testing Agents

Traditional software is deterministic: give it the same input, get the same output. Unit tests rely on this property to assert exact expected values.

AI agents break this assumption. The same prompt can produce different outputs each run, making standard testing approaches insufficient on their own.

Non-Determinism: Same Input, Different Output

LLMs are probabilistic by nature. The temperature parameter controls randomness — even at temperature=0, outputs can vary across model versions or infrastructure changes.

This means an agent test that passes today may fail tomorrow with no code change.

import openai

client = openai.OpenAI(api_key='YOUR_API_KEY')

# Same prompt, potentially different outputs each run
for i in range(3):
    response = client.chat.completions.create(
        model='gpt-4o-mini',
        messages=[{'role': 'user', 'content': 'Name a planet.'}],
        temperature=0.9  # High randomness
    )
    print(f'Run {i+1}: {response.choices[0].message.content}')
# Run 1: Mars
# Run 2: Jupiter
# Run 3: Saturn

All lessons in this course

  1. Why Testing Agents Is Different
  2. Mocking LLM Calls in Tests
  3. Assertion-Based Agent Testing
  4. Integration Tests for Agent Pipelines
← Back to AI Agents