What Is Temperature in LLMs?
Temperature as creative control: 0=deterministic, 2=chaotic, and everything between.
How LLMs Choose the Next Token
At each step, an LLM outputs a probability distribution over all tokens in its vocabulary (~50,000 tokens for GPT). The model assigns each token a score called a logit — a raw, unnormalized number. Higher logit = more likely token.
Temperature is the parameter that controls how these raw logits are converted into probabilities before sampling.
The Softmax Function
Logits are converted to probabilities using the softmax function. Softmax takes a vector of logits and outputs a probability distribution that sums to 1.
For a small vocabulary example with 4 tokens:
import numpy as np
# Raw logits from the model
logits = np.array([2.0, 1.0, 0.5, -1.0]) # scores for 4 tokens
# Standard softmax (temperature = 1)
def softmax(logits, temperature=1.0):
scaled = logits / temperature
exp_scaled = np.exp(scaled - np.max(scaled)) # subtract max for numerical stability
return exp_scaled / exp_scaled.sum()
probs = softmax(logits, temperature=1.0)
print('Probabilities:', np.round(probs, 3))
# [0.567, 0.208, 0.129, 0.095]
# Token 0 is most likely at 56.7%All lessons in this course
- What Is Temperature in LLMs?
- Top-p Nucleus Sampling
- Top-k Sampling
- Choosing Parameters for Your Use Case