GroupBy Transform and Filter
Use transform() to add group-level statistics back as a column and filter() to keep only groups meeting a condition.
Beyond Aggregation
After mastering agg(), the natural question is: what if you want to keep all original rows but enrich them with group-level statistics? Or keep only the groups that meet a condition? This is where transform() and filter() come in. These two methods extend GroupBy beyond simple summarisation into feature engineering and data selection.
Understanding transform()
transform() applies a function to each group and returns a result with the same shape as the original DataFrame — one value per original row. The group result is broadcast back to each row that belongs to that group. This makes it ideal for adding group-level statistics as new columns without changing the row count.
import pandas as pd
df = pd.DataFrame({
'dept': ['Eng', 'HR', 'Eng', 'HR', 'Eng'],
'salary': [90000, 60000, 95000, 62000, 88000]
})
# Add a column with each employee's department average salary
df['dept_avg'] = df.groupby('dept')['salary'].transform('mean')
print(df)
# dept salary dept_avg
# 0 Eng 90000 91000.000
# 1 HR 60000 61000.000
# 2 Eng 95000 91000.000
# 3 HR 62000 61000.000
# 4 Eng 88000 91000.000All lessons in this course
- The Split-Apply-Combine Pattern
- GroupBy with Single and Multiple Keys
- The agg() Method
- GroupBy Transform and Filter