GroupBy with Single and Multiple Keys
Group by one or more columns with groupby() and apply sum, mean, count, and min/max aggregations.
GroupBy with a Single Key
The simplest GroupBy call uses a single column as the grouping key. You call df.groupby('column_name') and chain an aggregation. Pandas creates one group for each unique value in that column and applies the aggregation to every numeric column (or the selected column). This is the most common form of GroupBy in day-to-day analysis.
import pandas as pd
df = pd.DataFrame({
'dept': ['Eng', 'HR', 'Eng', 'HR', 'Eng'],
'salary': [90000, 60000, 95000, 62000, 88000],
'years': [3, 5, 7, 2, 4]
})
print(df.groupby('dept')['salary'].mean())
# dept
# Eng 91000.0
# HR 61000.0Choosing Which Columns to Aggregate
After calling groupby() you can select one column with bracket notation to get a SeriesGroupBy, or select multiple columns with a list to get a DataFrameGroupBy. If you skip selection entirely, Pandas aggregates all numeric columns, which is convenient but can produce unexpected results when you have irrelevant numeric columns.
# Single column result -> SeriesGroupBy
print(df.groupby('dept')['salary'].sum())
# Multiple columns result -> DataFrameGroupBy
print(df.groupby('dept')[['salary', 'years']].mean())
# salary years
# dept
# Eng 91000.0 4.667
# HR 61000.0 3.500All lessons in this course
- The Split-Apply-Combine Pattern
- GroupBy with Single and Multiple Keys
- The agg() Method
- GroupBy Transform and Filter