From Sparse Counts to Dense Vectors
Why embeddings beat bag-of-words.
A Quick Recap
Bag-of-words and TF-IDF turned text into long count vectors. They work, but those sparse vectors hide a real weakness you are about to see.
Mostly Zeros
A bag-of-words vector has one slot per vocabulary word, so a short sentence is almost all zeros. We call this a sparse representation.
All lessons in this course
- From Sparse Counts to Dense Vectors
- How word2vec Learns Meaning
- Loading GloVe Vectors in Python
- Word Math: King Minus Man Plus Woman