Positional Encoding for Order
Inject sequence position into tokens.
Attention Ignores Order
Self-attention treats a sentence like a bag of tokens. Shuffle the words and the math barely changes, so the model has no built-in sense of order.
Order Carries Meaning
But order matters: "dog bites man" is not "man bites dog". We must hand the model some signal about each token's position.
All lessons in this course
- Self-Attention: Query, Key & Value
- Scaled Dot-Product & Multi-Head
- Positional Encoding for Order
- Stack a Transformer Encoder Block