Analysis and KPI Computation
Compute monthly cohort retention, product-level revenue, and rolling 30-day active users using groupby, pivot, and rolling.
Loading Analysis-Ready Data
With cleaning and feature engineering complete, load the analysis_ready.parquet checkpoint and begin computing KPIs. This stage is pure analysis — no more data cleaning. Having a clean, validated checkpoint lets you iterate quickly on KPI definitions without re-running the time-consuming ingestion and cleaning stages. The three KPIs to compute: monthly cohort retention, product-level revenue by category, and rolling 30-day active users.
import pandas as pd
df = pd.read_parquet('output/analysis_ready.parquet')
print(f'Loaded: {df.shape}')
print('KPIs to compute:')
print(' 1. Monthly cohort retention matrix')
print(' 2. Category revenue and margin ranking')
print(' 3. Rolling 30-day active users per region')KPI 1: Monthly Revenue by Category
The first KPI is monthly revenue by category. Group by both year_month and category, sum revenue, and pivot to get categories as columns and months as rows. This pivot table is the basis of the trend line chart and the top-category bar chart in the report. Apply a 3-month rolling mean to smooth the trend and highlight seasonality.
import pandas as pd
df = pd.read_parquet('output/analysis_ready.parquet')
# Monthly revenue by category
monthly_cat = (
df.groupby(['year_month', 'category'])['revenue']
.sum()
.reset_index()
.pivot(index='year_month', columns='category', values='revenue')
.fillna(0)
.sort_index()
)
# 3-month rolling average
monthly_cat_smooth = monthly_cat.rolling(3, min_periods=1).mean()
print('Monthly revenue by category (head):')
print(monthly_cat.round(0).head())
print('Shape:', monthly_cat.shape)All lessons in this course
- Project Setup and Data Ingestion
- Data Cleaning and Feature Engineering
- Analysis and KPI Computation
- Final Visualisation and Report Export