The Split-Apply-Combine Pattern
Understand the conceptual flow of groupby: splitting the DataFrame into groups, applying a function, and combining results.
What Is GroupBy?
The GroupBy operation is one of the most powerful patterns in data analysis. It lets you split a DataFrame into groups, apply a function to each group independently, and then combine the results into a new structure. This workflow is known as the split-apply-combine pattern, a term coined by statistician Hadley Wickham.
The Split Step
In the split step, Pandas divides the DataFrame into sub-DataFrames based on the unique values in one or more columns. For example, if your data has a region column with values like 'North', 'South', and 'West', the split step creates three separate groups. No data is moved or copied yet — Pandas just records which rows belong to which group.
import pandas as pd
df = pd.DataFrame({
'region': ['North', 'South', 'North', 'West', 'South'],
'sales': [200, 150, 300, 180, 220]
})
# split: create a GroupBy object
grouped = df.groupby('region')
print(type(grouped)) # DataFrameGroupByAll lessons in this course
- The Split-Apply-Combine Pattern
- GroupBy with Single and Multiple Keys
- The agg() Method
- GroupBy Transform and Filter