0Pricing
AI Prompt Engineering · Lesson

Top-k Sampling

Limiting choice to k most probable tokens and its effect on output diversity.

What Is Top-k Sampling?

Top-k sampling restricts the model to sampling from the k most probable tokens at each step. All tokens outside the top-k are assigned zero probability and cannot be selected.

k=1 is greedy decoding (only the single most probable token). k=50 is a typical creative range. k=vocabulary_size is equivalent to unrestricted sampling.

Top-k Algorithm

The algorithm is simpler than top-p:

  1. Compute softmax probabilities over the full vocabulary
  2. Sort tokens by probability descending
  3. Keep only the top k tokens; set all others to 0
  4. Renormalize the top-k probabilities to sum to 1
  5. Sample from the renormalized distribution
import numpy as np

def softmax(logits, temperature=1.0):
    scaled = logits / temperature
    e = np.exp(scaled - np.max(scaled))
    return e / e.sum()

def top_k_sample(logits, k=50, temperature=1.0):
    probs = softmax(logits, temperature)

    # Find top-k indices
    top_k_indices = np.argsort(probs)[::-1][:k]
    top_k_probs = probs[top_k_indices]

    # Renormalize
    top_k_probs = top_k_probs / top_k_probs.sum()

    # Sample
    chosen = np.random.choice(top_k_indices, p=top_k_probs)
    return chosen

# With a 10-token vocabulary:
logits = np.random.randn(10)
print('Chosen token:', top_k_sample(logits, k=3))

All lessons in this course

  1. What Is Temperature in LLMs?
  2. Top-p Nucleus Sampling
  3. Top-k Sampling
  4. Choosing Parameters for Your Use Case
← Back to AI Prompt Engineering