Character N-Grams for Robustness
Handle typos and rare words.
Word Features Are Fragile
Word-level features break on typos and rare spellings. The model sees runnning as a brand-new word, so a single slip can lose all its signal.
Drop Down to Characters
Instead of whole words, we slice text into short chunks of letters. These overlapping pieces are called character n-grams.
All lessons in this course
- Beyond Bag-of-Words
- Character N-Grams for Robustness
- Combining Multiple Feature Types
- Scaling and Selecting Features