How AI Generates Responses
Token prediction, probability, and why AI doesn't 'think' like humans.
Text as a Probability Problem
At its core, a language model does one thing: predict the next token given all the tokens before it.
A token is a small unit of text — roughly a word or word fragment. The model assigns a probability to every token in its vocabulary and picks one. Then it repeats the process, token by token, until it generates a complete response.
What Is a Token?
Tokens are the atoms of LLM text processing. English text tokenizes roughly as:
- Common words → 1 token each (
the,run) - Less common words → split into 2-3 tokens (
running→run+ning) - Punctuation and spaces → often their own tokens
Models like GPT-4o and Claude use tokenizers that handle 100k+ vocabulary entries including subwords across many languages.
import tiktoken
encoding = tiktoken.encoding_for_model('gpt-4o')
sentence = 'The temperature parameter controls randomness in token selection.'
tokens = encoding.encode(sentence)
token_strings = [encoding.decode([t]) for t in tokens]
print(f'Sentence: {sentence}')
print(f'Token count: {len(tokens)}')
print(f'Tokens: {token_strings}')All lessons in this course
- Understanding the Chat Interface
- Types of Requests AI Can Handle
- How AI Generates Responses
- What AI Cannot Do