0PricingLogin
NLP Academy · Lesson

Building a Vocabulary

Map every word to a fixed index.

What Is a Vocabulary?

A vocabulary is the full set of unique words your model knows. Each word gets one fixed slot in every document vector. 📖

Collect Every Unique Word

To build it, gather all words across your documents and keep only the distinct ones, dropping repeats.

words = set("the cat sat the mat".split())

All lessons in this course

  1. Why Models Need Numbers, Not Words
  2. Building a Vocabulary
  3. Counting With CountVectorizer
  4. Reading the Document-Term Matrix
← Back to NLP Academy