Building a Retention Matrix
Counting active users by cohort and period offset to form a retention table.
What a Retention Matrix Is
The follow-up to defining a cohort is the famous retention matrix: rows are cohorts, columns are period offsets (month 0, 1, 2, ...), and each cell counts how many of that cohort were still active at that offset.
Interviewers love this because it forces you to combine cohort assignment, a join back to activity, a period-difference calculation, and a pivot. It is the single most representative query of product analytics.
The Two Inputs
You need two things: each user's cohort period (from the previous lesson) and a record of every active period per user. Activity comes from the same events table, collapsed to the period grain.
So plan the query as: cohort CTE, then an activity CTE that lists which months each user was active, then join them.
WITH user_cohort AS (
SELECT user_id,
DATE_TRUNC('month', MIN(event_at)) AS cohort_month
FROM events
GROUP BY user_id
)
SELECT * FROM user_cohort;All lessons in this course
- Defining a Cohort by First Action
- Building a Retention Matrix
- Day-N and Rolling Retention
- Churn and Resurrection Queries