AI Prompt Engineering · Lesson

Calibration and Bias in LLM Judges

Position bias, verbosity bias, and how to mitigate them in judge prompts.

Why Judge Calibration Matters

An LLM judge that systematically scores one type of response higher than it deserves produces misleading evaluation results. You might ship a worse model because the judge preferred its verbose style — not its actual quality.

Calibration means the judge's scores accurately reflect true quality. A calibrated judge agrees with human raters at a measurable rate and doesn't systematically favor any one attribute unrelated to quality.

Position Bias: Deep Dive

Position bias is the strongest and most studied LLM judge bias. In pairwise comparison, judges prefer the first option 60-65% of the time independent of quality. This is equivalent to a coin that comes up heads 60% of the time — significant at scale.

The bias exists because LLMs are trained to generate continuations — seeing 'Response A:' first primes them toward A before they read B.

import anthropic

client = anthropic.Anthropic(api_key='sk-ant-...')

def measure_position_bias(question, n_pairs=20):
    """
    Measure position bias by comparing IDENTICAL responses.
    If both responses are the same, wins should be 50/50.
    Any deviation from 50/50 is pure position bias.
    """
    response = 'Machine learning is a subset of AI that learns from data.'
    first_wins = 0

    for _ in range(n_pairs):
        prompt = (
            f'Which response is better?\nQ: {question}\n'
            f'Response A: {response}\n'
            f'Response B: {response}\n'
            f'Reply with A or B.'
        )
        r = client.messages.create(
            model='claude-opus-4-5',
            max_tokens=5,
            messages=[{'role': 'user', 'content': prompt}]
        )
        if 'A' in r.content[0].text:
            first_wins += 1

    bias = first_wins / n_pairs
    print(f'First-position win rate with IDENTICAL responses: {bias:.0%}')
    print(f'Expected (no bias): 50%')
    print(f'Measured bias: {(bias - 0.5) * 100:+.0f}%')
    return bias

All lessons in this course

← Back to AI Prompt Engineering