0PricingLogin
NLP Academy · Lesson

N-Gram Features in scikit-learn

Add phrases to your vectorizer.

Let the Library Do It

You could hand-roll n-grams, but scikit-learn builds them for you. The trick is one small argument on your vectorizer.

The ngram_range Knob

Every text vectorizer accepts ngram_range, a tuple of min and max sizes. It tells the vectorizer which n-grams to count.

from sklearn.feature_extraction.text import CountVectorizer
vec = CountVectorizer(ngram_range=(1, 2))

All lessons in this course

  1. Why Single Words Lose Meaning
  2. Bigrams and Trigrams Explained
  3. N-Gram Features in scikit-learn
  4. Choosing the Right N-Gram Range
← Back to NLP Academy