Zero, One, and Few-Shot
Choosing the number of examples.
The Shot Spectrum
Shots denote the number of labeled demonstrations placed in the prompt before the live query. Zero-shot relies entirely on the model's pretrained priors; few-shot conditions the model on a small task-specific distribution at inference time without any weight updates.
This is in-context learning (ICL): the transformer treats the examples as part of the sequence and implicitly performs a kind of meta-learned regression over them. The choice of k (number of shots) is a hyperparameter you tune empirically, not a fixed best practice.
from dataclasses import dataclass
@dataclass
class ICLConfig:
k: int # number of demonstrations
selection: str # 'static' | 'dynamic'
order: str # 'random' | 'similarity' | 'curriculum'
# Zero-shot is simply k=0
cfg = ICLConfig(k=0, selection='static', order='random')When Zero-Shot Wins
Prefer zero-shot when the task is well represented in pretraining (summarization, translation, common classification) and when examples would bias the output format. For instruction-tuned models, a crisp directive plus an output schema often beats examples that subtly anchor style.
Zero-shot also minimizes token cost and latency, and avoids majority-label bias where the model over-predicts whichever class dominates your demonstrations.
# Zero-shot with explicit schema beats vague few-shot
PROMPT = (
'Classify sentiment as POSITIVE, NEGATIVE, or NEUTRAL.\n'
'Respond with only the label.\n\n'
'Text: ' + user_text + '\nLabel:'
)All lessons in this course
- Zero, One, and Few-Shot
- Designing Effective Examples
- Example Ordering and Recency
- Dynamic Few-Shot Selection