0Pricing
Pandas & NumPy Academy · Lesson

The Split-Apply-Combine Pattern

Understand the conceptual flow of groupby: splitting the DataFrame into groups, applying a function, and combining results.

What Is GroupBy?

The GroupBy operation is one of the most powerful patterns in data analysis. It lets you split a DataFrame into groups, apply a function to each group independently, and then combine the results into a new structure. This workflow is known as the split-apply-combine pattern, a term coined by statistician Hadley Wickham.

The Split Step

In the split step, Pandas divides the DataFrame into sub-DataFrames based on the unique values in one or more columns. For example, if your data has a region column with values like 'North', 'South', and 'West', the split step creates three separate groups. No data is moved or copied yet — Pandas just records which rows belong to which group.

import pandas as pd

df = pd.DataFrame({
    'region': ['North', 'South', 'North', 'West', 'South'],
    'sales': [200, 150, 300, 180, 220]
})

# split: create a GroupBy object
grouped = df.groupby('region')
print(type(grouped))  # DataFrameGroupBy

All lessons in this course

  1. The Split-Apply-Combine Pattern
  2. GroupBy with Single and Multiple Keys
  3. The agg() Method
  4. GroupBy Transform and Filter
← Back to Pandas & NumPy Academy