Top-k Sampling
Limiting choice to k most probable tokens and its effect on output diversity.
What Is Top-k Sampling?
Top-k sampling restricts the model to sampling from the k most probable tokens at each step. All tokens outside the top-k are assigned zero probability and cannot be selected.
k=1 is greedy decoding (only the single most probable token). k=50 is a typical creative range. k=vocabulary_size is equivalent to unrestricted sampling.
Top-k Algorithm
The algorithm is simpler than top-p:
- Compute softmax probabilities over the full vocabulary
- Sort tokens by probability descending
- Keep only the top k tokens; set all others to 0
- Renormalize the top-k probabilities to sum to 1
- Sample from the renormalized distribution
import numpy as np
def softmax(logits, temperature=1.0):
scaled = logits / temperature
e = np.exp(scaled - np.max(scaled))
return e / e.sum()
def top_k_sample(logits, k=50, temperature=1.0):
probs = softmax(logits, temperature)
# Find top-k indices
top_k_indices = np.argsort(probs)[::-1][:k]
top_k_probs = probs[top_k_indices]
# Renormalize
top_k_probs = top_k_probs / top_k_probs.sum()
# Sample
chosen = np.random.choice(top_k_indices, p=top_k_probs)
return chosen
# With a 10-token vocabulary:
logits = np.random.randn(10)
print('Chosen token:', top_k_sample(logits, k=3))