0Pricing
NLP Academy · Lesson

Regex Tokenization Tricks

Split text exactly the way you want.

Tokenizing With Patterns

You can tokenize text by describing what a token looks like, then letting regex find every piece that fits your definition.

Match Words Directly

The pattern \w+ with findall grabs every run of word characters, giving you a clean list of words and ignoring the spaces between them.

re.findall("\w+", "Hello, world!")

All lessons in this course

  1. Regex in 5 Minutes
  2. Finding Emails and URLs
  3. Capturing Groups and Replacements
  4. Regex Tokenization Tricks
← Back to NLP Academy