Revenue Calculations and Feature Engineering
Compute line-item revenue, create month and quarter columns, and add a discount flag for orders above a threshold.
Computing Line-Item Revenue
The most fundamental calculation in a sales dataset is line-item revenue: the product of quantity and unit price for each row. Create the column with a vectorised multiplication so NumPy processes every row simultaneously without a Python loop. This column becomes the basis for every aggregation in the analysis.
import pandas as pd
df = pd.read_parquet('sales_clean.parquet')
df['revenue'] = df['quantity'] * df['unit_price']
print(df[['quantity', 'unit_price', 'revenue']].head())Extracting Month and Quarter Columns
Grouping by calendar period is essential for trend analysis. Use the .dt accessor on a datetime column to extract year, month, and quarter. Storing these as separate integer columns makes groupby operations faster and lets you filter by period easily without repeated string parsing.
df['year'] = df['order_date'].dt.year
df['month'] = df['order_date'].dt.month
df['quarter'] = df['order_date'].dt.quarter
df['year_month'] = df['order_date'].dt.to_period('M')
print(df[['order_date', 'year', 'month', 'quarter', 'year_month']].head())All lessons in this course
- Loading and Auditing the Sales Dataset
- Revenue Calculations and Feature Engineering
- GroupBy Analysis by Region and Category
- Monthly Trend Visualisation