0PricingLogin
Deep Learning Academy · Lesson

Tokenize and Build a Vocabulary

Map text to integer ids.

Networks Only Eat Numbers

A neural net cannot read raw letters. Before any learning happens, you must turn text into numbers the model can crunch. 🔢

First Step: Tokenize

To tokenize means to split text into small pieces called tokens. The simplest choice is to split a sentence on spaces into words.

text = "I love deep learning"
tokens = text.split()
# ["I", "love", "deep", "learning"]

All lessons in this course

  1. Tokenize and Build a Vocabulary
  2. nn.Embedding: Learnable Word Vectors
  3. Why Embeddings Capture Meaning
  4. Train a Text Classifier
← Back to Deep Learning Academy