The Problem With Raw Counts
Why frequent words can mislead.
Counts Got Us Started
Bag-of-words turned text into numbers by counting each word. It works, but raw counts quietly mislead your model in ways worth fixing.
Frequent Words Dominate
The most common words in a document are usually the least useful. Their high frequency drowns out the rare words that actually carry meaning.
All lessons in this course
- The Problem With Raw Counts
- Term Frequency and Inverse Document Frequency
- TF-IDF With scikit-learn
- Finding the Most Important Words