AI Prompt Engineering · Lesson

What Is Temperature in LLMs?

Temperature as creative control: 0=deterministic, 2=chaotic, and everything between.

How LLMs Choose the Next Token

At each step, an LLM outputs a probability distribution over all tokens in its vocabulary (~50,000 tokens for GPT). The model assigns each token a score called a logit — a raw, unnormalized number. Higher logit = more likely token.

Temperature is the parameter that controls how these raw logits are converted into probabilities before sampling.

The Softmax Function

Logits are converted to probabilities using the softmax function. Softmax takes a vector of logits and outputs a probability distribution that sums to 1.

For a small vocabulary example with 4 tokens:

import numpy as np

# Raw logits from the model
logits = np.array([2.0, 1.0, 0.5, -1.0])  # scores for 4 tokens

# Standard softmax (temperature = 1)
def softmax(logits, temperature=1.0):
    scaled = logits / temperature
    exp_scaled = np.exp(scaled - np.max(scaled))  # subtract max for numerical stability
    return exp_scaled / exp_scaled.sum()

probs = softmax(logits, temperature=1.0)
print('Probabilities:', np.round(probs, 3))
# [0.567, 0.208, 0.129, 0.095]
# Token 0 is most likely at 56.7%

All lessons in this course

← Back to AI Prompt Engineering