Combining and Cleaning Text Columns
Concatenate multiple columns into one string, strip whitespace, and normalise inconsistent category labels.
Concatenating Text Columns
A common data preparation task is building a single string by combining values from multiple columns — for example, creating a full name from first and last name, or an address from street, city, and country. In Pandas, you concatenate string columns using the + operator on two Series (or a Series and a string literal), after converting numeric columns to strings with .astype(str).
import pandas as pd
df = pd.DataFrame({
'first_name': ['Alice', 'Bob', 'Carol'],
'last_name': ['Smith', 'Jones', 'White']
})
df['full_name'] = df['first_name'] + ' ' + df['last_name']
print(df['full_name'].tolist())
# ['Alice Smith', 'Bob Jones', 'Carol White']Concatenating with Non-String Columns
When combining a numeric column with a string column, you must first convert the numeric column to string using .astype(str). Python's + operator raises a TypeError if you try to add a string Series and an integer Series. This is a common source of confusion when building composite keys or labels from mixed-type columns.
import pandas as pd
df = pd.DataFrame({
'product': ['Widget', 'Gadget'],
'version': [3, 5]
})
# Must convert int to str before concatenating
df['label'] = df['product'] + ' v' + df['version'].astype(str)
print(df['label'].tolist())
# ['Widget v3', 'Gadget v5']All lessons in this course
- The .str Accessor
- Splitting and Replacing Strings
- Pattern Matching with Regex
- Combining and Cleaning Text Columns