Compiling and Optimizing Prompts
BootstrapFewShot, MIPRO, and other DSPy optimizers in practice.
What Optimization Means in DSPy
DSPy optimization (called compilation) finds the best prompt configuration for your program given a training set and a metric. The optimizer searches over possible few-shot examples, instructions, and reasoning demonstrations.
You run compilation once, save the result, and deploy the optimized program. At inference time it's just fast LLM calls — no more optimization overhead.
Defining a Metric Function
Every DSPy optimizer needs a metric function that scores a prediction given the expected output. It returns a number (or boolean) — higher is better.
The metric is the signal the optimizer uses to decide if a prompt configuration is good.
# Metric: exact match on answer field
def exact_match_metric(example, prediction, trace=None):
"""
example: a training example with .answer
prediction: the module's output with .answer
Returns 1.0 if correct, 0.0 otherwise
"""
expected = example.answer.strip().lower()
predicted = prediction.answer.strip().lower()
return float(expected == predicted)
# Metric: F1 score for token overlap (common in QA)
def token_f1_metric(example, prediction, trace=None):
gold_tokens = set(example.answer.lower().split())
pred_tokens = set(prediction.answer.lower().split())
if not pred_tokens:
return 0.0
precision = len(gold_tokens & pred_tokens) / len(pred_tokens)
recall = len(gold_tokens & pred_tokens) / len(gold_tokens)
if precision + recall == 0:
return 0.0
return 2 * precision * recall / (precision + recall)All lessons in this course
- Introduction to DSPy Framework
- Defining Signatures and Modules
- Compiling and Optimizing Prompts
- Evaluating DSPy Pipelines