Inside the Transformer Block
How the full architecture fits together.
Stacking Blocks
A Transformer is built by stacking the same block many times. Each block refines the representation a little more. 🧱
Two Main Sublayers
Every block has two parts: a multi-head attention sublayer followed by a small feed-forward network applied to each position.
All lessons in this course
- The Idea of Attention
- Self-Attention, Step by Step
- Multi-Head Attention and Positions
- Inside the Transformer Block