Counting With CountVectorizer
Turn documents into a count matrix.
Meet CountVectorizer
The CountVectorizer from scikit-learn builds your vocabulary and counts words for you in just a few lines. 🚀
from sklearn.feature_extraction.text import CountVectorizerCreate the Vectorizer
First make an instance. With no arguments it uses sensible defaults for tokenizing and lowercasing text.
vectorizer = CountVectorizer()All lessons in this course
- Why Models Need Numbers, Not Words
- Building a Vocabulary
- Counting With CountVectorizer
- Reading the Document-Term Matrix