0Pricing
Pandas & NumPy Academy · Lesson

GroupBy Transform and Filter

Use transform() to add group-level statistics back as a column and filter() to keep only groups meeting a condition.

Beyond Aggregation

After mastering agg(), the natural question is: what if you want to keep all original rows but enrich them with group-level statistics? Or keep only the groups that meet a condition? This is where transform() and filter() come in. These two methods extend GroupBy beyond simple summarisation into feature engineering and data selection.

Understanding transform()

transform() applies a function to each group and returns a result with the same shape as the original DataFrame — one value per original row. The group result is broadcast back to each row that belongs to that group. This makes it ideal for adding group-level statistics as new columns without changing the row count.

import pandas as pd

df = pd.DataFrame({
    'dept':   ['Eng', 'HR', 'Eng', 'HR', 'Eng'],
    'salary': [90000, 60000, 95000, 62000, 88000]
})

# Add a column with each employee's department average salary
df['dept_avg'] = df.groupby('dept')['salary'].transform('mean')
print(df)
#    dept  salary    dept_avg
# 0   Eng   90000  91000.000
# 1    HR   60000  61000.000
# 2   Eng   95000  91000.000
# 3    HR   62000  61000.000
# 4   Eng   88000  91000.000

All lessons in this course

  1. The Split-Apply-Combine Pattern
  2. GroupBy with Single and Multiple Keys
  3. The agg() Method
  4. GroupBy Transform and Filter
← Back to Pandas & NumPy Academy