0PricingLogin
Deep Learning Academy · Lesson

Self-Attention: Query, Key & Value

Let every token look at every other.

Tokens That Talk

In a sequence, the meaning of one word depends on others. Self-attention lets every token look at every other token to gather the context it needs.

Three Roles per Token

Each token plays three roles: a query that asks, a key that answers, and a value that carries content. These come from the same word, used three ways.

All lessons in this course

  1. Self-Attention: Query, Key & Value
  2. Scaled Dot-Product & Multi-Head
  3. Positional Encoding for Order
  4. Stack a Transformer Encoder Block
← Back to Deep Learning Academy