0Pricing
Pandas & NumPy Academy · Lesson

Selecting Columns by Pattern

Use filter(like=), filter(regex=), and list comprehensions to select columns matching a name pattern.

Why Select Columns by Pattern?

DataFrames from real-world sources often have dozens or hundreds of columns, and you rarely need all of them. When columns follow a naming convention — prefixes like sales_, suffixes like _2023, or keyword patterns — selecting them by name pattern is far more maintainable than listing every column explicitly. If the schema changes, pattern-based selection adapts automatically.

Pandas provides two primary tools: the filter() method and list comprehensions on df.columns.

import pandas as pd

df = pd.DataFrame({
    'sales_jan': [100, 200],
    'sales_feb': [150, 250],
    'cost_jan': [50, 80],
    'cost_feb': [60, 90],
    'profit': [140, 280]
})

print(df.columns.tolist())
# ['sales_jan', 'sales_feb', 'cost_jan', 'cost_feb', 'profit']

filter(like=) for Substring Matching

DataFrame.filter(like='substring') selects columns whose names contain the given substring anywhere in the name (case-sensitive). It operates on the column axis by default and returns a new DataFrame with only the matching columns.

This is the simplest pattern-selection tool: no regex knowledge required, just a substring to look for.

import pandas as pd

df = pd.DataFrame({
    'sales_jan': [100, 200],
    'sales_feb': [150, 250],
    'cost_jan': [50, 80],
    'revenue_q1': [110, 210]
})

# Select columns whose name contains 'sales'
sales_cols = df.filter(like='sales')
print(sales_cols)
#    sales_jan  sales_feb
# 0        100        150
# 1        200        250

All lessons in this course

  1. Boolean Indexing on DataFrames
  2. The query() Method
  3. isin() and between() Filters
  4. Selecting Columns by Pattern
← Back to Pandas & NumPy Academy