Training on TF-IDF Features
Fit a classifier in scikit-learn.
From Text to Numbers First
A model cannot read raw sentences. You first convert documents into TF-IDF vectors, then train logistic regression on those numbers. 🔢
Fit the Vectorizer
The TfidfVectorizer learns your vocabulary and weighting from the training texts. One call turns a list of strings into a feature matrix.
from sklearn.feature_extraction.text import TfidfVectorizer
vec = TfidfVectorizer()
X = vec.fit_transform(train_texts)All lessons in this course
- Why Logistic Regression Wins on Text
- Training on TF-IDF Features
- Inspecting the Strongest Coefficients
- Tuning Regularization Strength