0Pricing
Pandas & NumPy Academy · Lesson

GroupBy with Single and Multiple Keys

Group by one or more columns with groupby() and apply sum, mean, count, and min/max aggregations.

GroupBy with a Single Key

The simplest GroupBy call uses a single column as the grouping key. You call df.groupby('column_name') and chain an aggregation. Pandas creates one group for each unique value in that column and applies the aggregation to every numeric column (or the selected column). This is the most common form of GroupBy in day-to-day analysis.

import pandas as pd

df = pd.DataFrame({
    'dept': ['Eng', 'HR', 'Eng', 'HR', 'Eng'],
    'salary': [90000, 60000, 95000, 62000, 88000],
    'years': [3, 5, 7, 2, 4]
})

print(df.groupby('dept')['salary'].mean())
# dept
# Eng    91000.0
# HR     61000.0

Choosing Which Columns to Aggregate

After calling groupby() you can select one column with bracket notation to get a SeriesGroupBy, or select multiple columns with a list to get a DataFrameGroupBy. If you skip selection entirely, Pandas aggregates all numeric columns, which is convenient but can produce unexpected results when you have irrelevant numeric columns.

# Single column result -> SeriesGroupBy
print(df.groupby('dept')['salary'].sum())

# Multiple columns result -> DataFrameGroupBy
print(df.groupby('dept')[['salary', 'years']].mean())
#        salary  years
# dept
# Eng   91000.0    4.667
# HR    61000.0    3.500

All lessons in this course

  1. The Split-Apply-Combine Pattern
  2. GroupBy with Single and Multiple Keys
  3. The agg() Method
  4. GroupBy Transform and Filter
← Back to Pandas & NumPy Academy