0Pricing
Pandas & NumPy Academy · Lesson

Combining and Cleaning Text Columns

Concatenate multiple columns into one string, strip whitespace, and normalise inconsistent category labels.

Concatenating Text Columns

A common data preparation task is building a single string by combining values from multiple columns — for example, creating a full name from first and last name, or an address from street, city, and country. In Pandas, you concatenate string columns using the + operator on two Series (or a Series and a string literal), after converting numeric columns to strings with .astype(str).

import pandas as pd

df = pd.DataFrame({
    'first_name': ['Alice', 'Bob', 'Carol'],
    'last_name': ['Smith', 'Jones', 'White']
})

df['full_name'] = df['first_name'] + ' ' + df['last_name']
print(df['full_name'].tolist())
# ['Alice Smith', 'Bob Jones', 'Carol White']

Concatenating with Non-String Columns

When combining a numeric column with a string column, you must first convert the numeric column to string using .astype(str). Python's + operator raises a TypeError if you try to add a string Series and an integer Series. This is a common source of confusion when building composite keys or labels from mixed-type columns.

import pandas as pd

df = pd.DataFrame({
    'product': ['Widget', 'Gadget'],
    'version': [3, 5]
})

# Must convert int to str before concatenating
df['label'] = df['product'] + ' v' + df['version'].astype(str)
print(df['label'].tolist())
# ['Widget v3', 'Gadget v5']

All lessons in this course

  1. The .str Accessor
  2. Splitting and Replacing Strings
  3. Pattern Matching with Regex
  4. Combining and Cleaning Text Columns
← Back to Pandas & NumPy Academy