Designing Effective Examples
Selecting representative demonstrations.
Examples Are Training Data
In few-shot prompting, your demonstrations are the training set, just delivered at inference time. Every property you would care about for fine-tuning data applies: representativeness, coverage, label accuracy, diversity, and freedom from leakage.
Sloppy examples teach sloppy behavior. The model will faithfully imitate hedging, verbosity, inconsistent formatting, and subtle reasoning errors present in your demos.
# Treat demo curation with the rigor of a labeled dataset
class Demo:
def __init__(self, input, output, meta):
self.input = input # representative of real traffic
self.output = output # the EXACT behavior you want copied
self.meta = meta # difficulty, class, length bucketRepresentativeness Over Cleverness
Choose demonstrations whose input distribution matches production traffic. A demo set of pristine, short, easy cases will fail on the messy, long, ambiguous inputs your users actually send.
Sample real logs, cluster them, and pick one representative per cluster. This covers the modes of your distribution far better than hand-picking impressive but atypical examples.
from sklearn.cluster import KMeans
def representative_demos(embeddings, raw, k):
km = KMeans(n_clusters=k).fit(embeddings)
picks = []
for c in range(k):
members = [i for i, lbl in enumerate(km.labels_) if lbl == c]
center = km.cluster_centers_[c]
best = min(members, key=lambda i: dist(embeddings[i], center))
picks.append(raw[best])
return picksAll lessons in this course
- Zero, One, and Few-Shot
- Designing Effective Examples
- Example Ordering and Recency
- Dynamic Few-Shot Selection