0Pricing
Deep Learning Academy · Lesson

Stack a Transformer Encoder Block

Attention plus feedforward and norms.

The Building Block

A transformer is just one encoder block repeated. Learn the block and you understand the whole tower, from tiny models to giant ones.

Two Sub-Layers

Each block has two parts: a multi-head attention sub-layer, then a small feedforward network. Both wrapped with residuals and normalization.

All lessons in this course

  1. Self-Attention: Query, Key & Value
  2. Scaled Dot-Product & Multi-Head
  3. Positional Encoding for Order
  4. Stack a Transformer Encoder Block
← Back to Deep Learning Academy