Beyond Bag-of-Words
Length, readability, and metadata signals.
The Baseline Wall
Bag-of-words and TF-IDF get you a solid first model. But at some point the score stops climbing, and you need richer features to push past it.
What Counts Get Wrong
Word counts ignore everything about a document except which words appear. Tone, length, and structure all carry signal that pure counts throw away.