AI Prompt Engineering · Lesson

Compiling and Optimizing Prompts

BootstrapFewShot, MIPRO, and other DSPy optimizers in practice.

What Optimization Means in DSPy

DSPy optimization (called compilation) finds the best prompt configuration for your program given a training set and a metric. The optimizer searches over possible few-shot examples, instructions, and reasoning demonstrations.

You run compilation once, save the result, and deploy the optimized program. At inference time it's just fast LLM calls — no more optimization overhead.

Defining a Metric Function

Every DSPy optimizer needs a metric function that scores a prediction given the expected output. It returns a number (or boolean) — higher is better.

The metric is the signal the optimizer uses to decide if a prompt configuration is good.

# Metric: exact match on answer field
def exact_match_metric(example, prediction, trace=None):
    """
    example: a training example with .answer
    prediction: the module's output with .answer
    Returns 1.0 if correct, 0.0 otherwise
    """
    expected = example.answer.strip().lower()
    predicted = prediction.answer.strip().lower()
    return float(expected == predicted)

# Metric: F1 score for token overlap (common in QA)
def token_f1_metric(example, prediction, trace=None):
    gold_tokens = set(example.answer.lower().split())
    pred_tokens = set(prediction.answer.lower().split())
    if not pred_tokens:
        return 0.0
    precision = len(gold_tokens & pred_tokens) / len(pred_tokens)
    recall = len(gold_tokens & pred_tokens) / len(gold_tokens)
    if precision + recall == 0:
        return 0.0
    return 2 * precision * recall / (precision + recall)

All lessons in this course

← Back to AI Prompt Engineering