Clustering for Customer Segmentation: End-to-End Example
Learners will preprocess an e-commerce dataset, cluster customers by spend and frequency, and profile each segment to generate business insights.
The Business Goal: Segment Customers
Customer segmentation groups buyers by behaviour so that marketing, product, and customer-success teams can tailor their actions to each group. Typical signals include recency (days since last purchase), frequency (number of purchases), and monetary value (total spend) — the RFM framework. Clustering discovers these segments from data without needing predefined categories.
Loading and Inspecting the Dataset
We use the classic Online Retail dataset (UCI ML Repository). It contains ~500k transactions with invoice date, customer ID, quantity, and unit price. Our first task is to load the data, drop rows with missing customer IDs, filter out returns (negative quantity), and compute the RFM features for each customer.
import pandas as pd
df = pd.read_csv('online_retail.csv', encoding='latin1')
# Drop missing customers and returns
df = df.dropna(subset=['CustomerID'])
df = df[df['Quantity'] > 0]
df['Revenue'] = df['Quantity'] * df['UnitPrice']
df['InvoiceDate'] = pd.to_datetime(df['InvoiceDate'])
print(df.shape)
print(df.dtypes)All lessons in this course
- K-Means: Centroids, Assignment, and Update Steps
- Choosing K: Elbow Method and Silhouette Score
- DBSCAN: Core Points, Border Points, and Noise
- Clustering for Customer Segmentation: End-to-End Example