AI Prompt Engineering · Lesson

Top-k Sampling

Limiting choice to k most probable tokens and its effect on output diversity.

What Is Top-k Sampling?

Top-k sampling restricts the model to sampling from the k most probable tokens at each step. All tokens outside the top-k are assigned zero probability and cannot be selected.

k=1 is greedy decoding (only the single most probable token). k=50 is a typical creative range. k=vocabulary_size is equivalent to unrestricted sampling.

Top-k Algorithm

The algorithm is simpler than top-p:

Compute softmax probabilities over the full vocabulary
Sort tokens by probability descending
Keep only the top k tokens; set all others to 0
Renormalize the top-k probabilities to sum to 1
Sample from the renormalized distribution

import numpy as np

def softmax(logits, temperature=1.0):
    scaled = logits / temperature
    e = np.exp(scaled - np.max(scaled))
    return e / e.sum()

def top_k_sample(logits, k=50, temperature=1.0):
    probs = softmax(logits, temperature)

    # Find top-k indices
    top_k_indices = np.argsort(probs)[::-1][:k]
    top_k_probs = probs[top_k_indices]

    # Renormalize
    top_k_probs = top_k_probs / top_k_probs.sum()

    # Sample
    chosen = np.random.choice(top_k_indices, p=top_k_probs)
    return chosen

# With a 10-token vocabulary:
logits = np.random.randn(10)
print('Chosen token:', top_k_sample(logits, k=3))

All lessons in this course

← Back to AI Prompt Engineering