Pandas & NumPy Academy · Lesson

Analysis and KPI Computation

Compute monthly cohort retention, product-level revenue, and rolling 30-day active users using groupby, pivot, and rolling.

Loading Analysis-Ready Data

With cleaning and feature engineering complete, load the analysis_ready.parquet checkpoint and begin computing KPIs. This stage is pure analysis — no more data cleaning. Having a clean, validated checkpoint lets you iterate quickly on KPI definitions without re-running the time-consuming ingestion and cleaning stages. The three KPIs to compute: monthly cohort retention, product-level revenue by category, and rolling 30-day active users.

import pandas as pd

df = pd.read_parquet('output/analysis_ready.parquet')
print(f'Loaded: {df.shape}')
print('KPIs to compute:')
print('  1. Monthly cohort retention matrix')
print('  2. Category revenue and margin ranking')
print('  3. Rolling 30-day active users per region')

KPI 1: Monthly Revenue by Category

The first KPI is monthly revenue by category. Group by both year_month and category, sum revenue, and pivot to get categories as columns and months as rows. This pivot table is the basis of the trend line chart and the top-category bar chart in the report. Apply a 3-month rolling mean to smooth the trend and highlight seasonality.

import pandas as pd

df = pd.read_parquet('output/analysis_ready.parquet')

# Monthly revenue by category
monthly_cat = (
    df.groupby(['year_month', 'category'])['revenue']
    .sum()
    .reset_index()
    .pivot(index='year_month', columns='category', values='revenue')
    .fillna(0)
    .sort_index()
)

# 3-month rolling average
monthly_cat_smooth = monthly_cat.rolling(3, min_periods=1).mean()

print('Monthly revenue by category (head):')
print(monthly_cat.round(0).head())
print('Shape:', monthly_cat.shape)

All lessons in this course

← Back to Pandas & NumPy Academy