Defining a Cohort by First Action
Assigning each user a cohort based on their first event date.
Why Cohorts Show Up in Interviews
When a product-analytics interviewer says "build a cohort", they are testing whether you can assign every user to a group based on when they first did something, then track that group over time.
A cohort is a set of users who share a starting event in the same period, usually their first purchase, signup, or login. The power of cohorts is that they let you compare users on equal footing: everyone in the January cohort is measured from their own January start.
The first sub-skill, and the one this lesson drills, is computing each user's first action date reliably.
The Source Table
Nearly every cohort question starts from an events table: one row per user action with a timestamp. Picture an events table:
user_id— who actedevent_type— what they didevent_at— when, as a timestamp
In an interview, clarify the grain out loud: "Is this one row per event, and can a user appear many times?" The answer is almost always yes, which is exactly why you need aggregation to collapse to a per-user first action.
CREATE TABLE events (
user_id INT,
event_type VARCHAR(50),
event_at TIMESTAMP
);All lessons in this course
- Defining a Cohort by First Action
- Building a Retention Matrix
- Day-N and Rolling Retention
- Churn and Resurrection Queries