AI Prompt Engineering · Lesson

How AI Generates Responses

Token prediction, probability, and why AI doesn't 'think' like humans.

Text as a Probability Problem

At its core, a language model does one thing: predict the next token given all the tokens before it.

A token is a small unit of text — roughly a word or word fragment. The model assigns a probability to every token in its vocabulary and picks one. Then it repeats the process, token by token, until it generates a complete response.

What Is a Token?

Tokens are the atoms of LLM text processing. English text tokenizes roughly as:

Common words → 1 token each (the, run)
Less common words → split into 2-3 tokens (running → run + ning)
Punctuation and spaces → often their own tokens

Models like GPT-4o and Claude use tokenizers that handle 100k+ vocabulary entries including subwords across many languages.

import tiktoken

encoding = tiktoken.encoding_for_model('gpt-4o')

sentence = 'The temperature parameter controls randomness in token selection.'
tokens = encoding.encode(sentence)
token_strings = [encoding.decode([t]) for t in tokens]

print(f'Sentence: {sentence}')
print(f'Token count: {len(tokens)}')
print(f'Tokens: {token_strings}')

All lessons in this course

← Back to AI Prompt Engineering