Stack a Transformer Encoder Block
Attention plus feedforward and norms.
The Building Block
A transformer is just one encoder block repeated. Learn the block and you understand the whole tower, from tiny models to giant ones.
Two Sub-Layers
Each block has two parts: a multi-head attention sub-layer, then a small feedforward network. Both wrapped with residuals and normalization.
All lessons in this course
- Self-Attention: Query, Key & Value
- Scaled Dot-Product & Multi-Head
- Positional Encoding for Order
- Stack a Transformer Encoder Block