AI Prompt Engineering · Lesson

Top-p Nucleus Sampling

How top-p restricts sampling to the most probable token set.

What Is Top-p Sampling?

Top-p sampling (also called nucleus sampling) is a technique that restricts sampling to a dynamic subset of the vocabulary. Instead of sampling from all tokens, the model considers only the smallest set of tokens whose cumulative probability is at least p.

Proposed in the paper 'The Curious Case of Neural Text Degeneration' (Holtzman et al., 2019), it outperforms simple temperature scaling for diverse yet coherent generation.

How Top-p Works Step by Step

Algorithm:

Compute softmax probabilities over the full vocabulary
Sort tokens by probability (highest first)
Walk down the sorted list, accumulating probability, until the cumulative sum reaches p
This set of tokens is the nucleus
Sample from the nucleus only (renormalize probabilities to sum to 1)

import numpy as np

def top_p_sample(logits, p=0.9):
    probs = softmax(logits, temperature=1.0)

    # Sort by probability descending
    sorted_indices = np.argsort(probs)[::-1]
    sorted_probs = probs[sorted_indices]

    # Find nucleus: smallest set with cumulative prob >= p
    cumulative = np.cumsum(sorted_probs)
    nucleus_size = np.searchsorted(cumulative, p) + 1
    nucleus_indices = sorted_indices[:nucleus_size]
    nucleus_probs = sorted_probs[:nucleus_size]

    # Renormalize
    nucleus_probs = nucleus_probs / nucleus_probs.sum()

    # Sample
    chosen = np.random.choice(nucleus_indices, p=nucleus_probs)
    return chosen

token = top_p_sample(logits, p=0.9)

All lessons in this course

← Back to AI Prompt Engineering