Self-Attention: Query, Key & Value
Let every token look at every other.
Tokens That Talk
In a sequence, the meaning of one word depends on others. Self-attention lets every token look at every other token to gather the context it needs.
Three Roles per Token
Each token plays three roles: a query that asks, a key that answers, and a value that carries content. These come from the same word, used three ways.
All lessons in this course
- Self-Attention: Query, Key & Value
- Scaled Dot-Product & Multi-Head
- Positional Encoding for Order
- Stack a Transformer Encoder Block