0Pricing
Pandas & NumPy Academy · Lesson

Splitting and Replacing Strings

Split strings into multiple columns with .str.split(expand=True) and replace substrings with .str.replace().

Splitting Strings into a List

.str.split(sep) splits each string in a Series at every occurrence of the separator and returns a Series of lists. Without expand=True, each element is a Python list, which is useful for counting tokens or further list-level processing. The separator can be a literal string or a regex pattern.

import pandas as pd

df = pd.DataFrame({'tags': ['python,data,pandas', 'ml,ai', 'sql,db,query']})

# Split into a list of strings
df['tag_list'] = df['tags'].str.split(',')
print(df['tag_list'])
# 0    [python, data, pandas]
# 1                  [ml, ai]
# 2         [sql, db, query]

# Count tags per row
df['tag_count'] = df['tag_list'].str.len()
print(df['tag_count'].tolist())  # [3, 2, 3]

expand=True to Split into Columns

Passing expand=True to .str.split() returns a DataFrame instead of a Series of lists, where each split token becomes its own column (column 0, 1, 2, …). This is the standard way to widen a delimited column — for example, splitting a 'first last' full name into separate first and last name columns.

import pandas as pd

df = pd.DataFrame({'full_name': ['Alice Smith', 'Bob Jones', 'Carol White']})

# Split into two columns
name_parts = df['full_name'].str.split(' ', expand=True)
name_parts.columns = ['first_name', 'last_name']
df = pd.concat([df, name_parts], axis=1)
print(df)
#     full_name first_name last_name
# 0  Alice Smith      Alice     Smith
# 1    Bob Jones        Bob     Jones
# 2  Carol White      Carol     White

All lessons in this course

  1. The .str Accessor
  2. Splitting and Replacing Strings
  3. Pattern Matching with Regex
  4. Combining and Cleaning Text Columns
← Back to Pandas & NumPy Academy