Regex Tokenization Tricks
Split text exactly the way you want.
Tokenizing With Patterns
You can tokenize text by describing what a token looks like, then letting regex find every piece that fits your definition.
Match Words Directly
The pattern \w+ with findall grabs every run of word characters, giving you a clean list of words and ignoring the spaces between them.
re.findall("\w+", "Hello, world!")All lessons in this course
- Regex in 5 Minutes
- Finding Emails and URLs
- Capturing Groups and Replacements
- Regex Tokenization Tricks