Deep Learning Academy · Lesson

Stack a Transformer Encoder Block

Attention plus feedforward and norms.

The Building Block

A transformer is just one encoder block repeated. Learn the block and you understand the whole tower, from tiny models to giant ones.

Each block has two parts: a multi-head attention sub-layer, then a small feedforward network. Both wrapped with residuals and normalization.