nn.Embedding: Learnable Word Vectors
A lookup table trained by gradient descent.
Ids Alone Are Not Enough
Token ids are just labels. The id 5 is not greater than 2 in any meaningful way, so feeding raw ids to a net misleads it.
One-Hot Is Wasteful
One way to represent ids is a giant one-hot vector, but with thousands of words it is huge, sparse, and carries no similarity.
All lessons in this course
- Tokenize and Build a Vocabulary
- nn.Embedding: Learnable Word Vectors
- Why Embeddings Capture Meaning
- Train a Text Classifier