0PricingLogin
Pandas & NumPy Academy · Lesson

Cohort Retention Table

Build a cohort retention matrix showing the percentage of week-0 users still active in subsequent weeks using pivot_table.

What Is a Cohort Retention Table?

A cohort retention table groups users by the week (or month) they first used the product and then tracks what percentage of each cohort is still active in subsequent weeks. Row 0 always shows 100 % because the definition of week-0 is the signup week. Values in later columns reveal how quickly the product loses users — the slower the decay, the stronger the retention.

import pandas as pd

df = pd.read_csv('user_events.csv', parse_dates=['event_date'])
print(df.dtypes)
print(df.head())

Assigning Each User a Cohort Week

The cohort week is the week of each user's first event. Use groupby('user_id')['event_date'].min() to find the first event date per user, then convert to an ISO week with .dt.to_period('W'). Merge this cohort label back to the main events DataFrame so every row knows which cohort week its user belongs to.

cohort_week = df.groupby('user_id')['event_date'].min().dt.to_period('W').rename('cohort_week')
df = df.join(cohort_week, on='user_id')

df['event_week'] = df['event_date'].dt.to_period('W')

print(df[['user_id', 'event_date', 'cohort_week', 'event_week']].head())

All lessons in this course

  1. Sessionisation and Event Sequencing
  2. Funnel Analysis
  3. Cohort Retention Table
  4. Visualising User Journeys
← Back to Pandas & NumPy Academy