N-Gram Features in scikit-learn
Add phrases to your vectorizer.
Let the Library Do It
You could hand-roll n-grams, but scikit-learn builds them for you. The trick is one small argument on your vectorizer.
The ngram_range Knob
Every text vectorizer accepts ngram_range, a tuple of min and max sizes. It tells the vectorizer which n-grams to count.
from sklearn.feature_extraction.text import CountVectorizer
vec = CountVectorizer(ngram_range=(1, 2))All lessons in this course
- Why Single Words Lose Meaning
- Bigrams and Trigrams Explained
- N-Gram Features in scikit-learn
- Choosing the Right N-Gram Range