Pandas & NumPy Academy · Lesson

Rank and Percentile within Groups

Compute within-group ranks using groupby().rank() and create percentile buckets with pd.qcut for relative comparisons.

Why Rank Within Groups?

Raw values are often less meaningful than relative rankings. A sales rep with $50,000 in monthly revenue is a top performer if the average is $30,000, but a poor performer if the average is $80,000. By computing ranks within groups (e.g. rank within each region or rank within each quarter), you get a normalised comparison that accounts for different baselines across groups. Pandas makes within-group ranking easy with groupby().rank().

import pandas as pd
import numpy as np

np.random.seed(42)
df = pd.DataFrame({
    'rep': ['Alice', 'Bob', 'Carol', 'Dave', 'Eve',
            'Frank', 'Grace', 'Hank', 'Iris', 'Jake'],
    'region': ['East']*5 + ['West']*5,
    'sales': np.random.randint(30000, 100000, 10)
})
print(df.sort_values('region'))

Series.rank() — Basic Ranking

Series.rank() assigns a rank to each value: rank 1 is the smallest, rank n is the largest. The ascending=False parameter reverses the direction so rank 1 is the largest. The result is a float Series (not integer) because ties are resolved by averaging the tied ranks by default. For a column with 5 values, ranks range from 1.0 to 5.0 — or with ties, some values may share a rank like 2.5.

import pandas as pd

s = pd.Series([80, 45, 90, 45, 70])

print('Values:', s.values)
print('Rank (ascending=True, default):', s.rank().values)
print('Rank (ascending=False — best=1):', s.rank(ascending=False).values)
# Note: two 45s share ranks 1 and 2 → both get 1.5

All lessons in this course

← Back to Pandas & NumPy Academy