Selecting Columns by Pattern
Use filter(like=), filter(regex=), and list comprehensions to select columns matching a name pattern.
Why Select Columns by Pattern?
DataFrames from real-world sources often have dozens or hundreds of columns, and you rarely need all of them. When columns follow a naming convention — prefixes like sales_, suffixes like _2023, or keyword patterns — selecting them by name pattern is far more maintainable than listing every column explicitly. If the schema changes, pattern-based selection adapts automatically.
Pandas provides two primary tools: the filter() method and list comprehensions on df.columns.
import pandas as pd
df = pd.DataFrame({
'sales_jan': [100, 200],
'sales_feb': [150, 250],
'cost_jan': [50, 80],
'cost_feb': [60, 90],
'profit': [140, 280]
})
print(df.columns.tolist())
# ['sales_jan', 'sales_feb', 'cost_jan', 'cost_feb', 'profit']filter(like=) for Substring Matching
DataFrame.filter(like='substring') selects columns whose names contain the given substring anywhere in the name (case-sensitive). It operates on the column axis by default and returns a new DataFrame with only the matching columns.
This is the simplest pattern-selection tool: no regex knowledge required, just a substring to look for.
import pandas as pd
df = pd.DataFrame({
'sales_jan': [100, 200],
'sales_feb': [150, 250],
'cost_jan': [50, 80],
'revenue_q1': [110, 210]
})
# Select columns whose name contains 'sales'
sales_cols = df.filter(like='sales')
print(sales_cols)
# sales_jan sales_feb
# 0 100 150
# 1 200 250All lessons in this course
- Boolean Indexing on DataFrames
- The query() Method
- isin() and between() Filters
- Selecting Columns by Pattern