Sessionisation and Event Sequencing
Group events by user and session, compute session duration with timestamp arithmetic, and sort events chronologically.
Understanding Clickstream Data
A clickstream dataset records every user action: page views, button clicks, and purchases — each as a row with a user_id, session_id, event_type, and timestamp. Before computing any metrics, you must understand the schema. Run df.dtypes and df.sample(5) to see representative rows and confirm the timestamp column parsed as datetime rather than string.
import pandas as pd
df = pd.read_csv('clickstream.csv', parse_dates=['timestamp'])
print(df.dtypes)
print(df.sample(5))Sorting Events Chronologically
Event analysis is meaningless if rows are not in time order. Sort by user_id and then timestamp so all events for the same user appear consecutively and in the correct sequence. Use sort_values(['user_id', 'timestamp']) and reset the index to get a clean integer sequence reflecting the sorted order.
df = df.sort_values(['user_id', 'timestamp']).reset_index(drop=True)
print(df[['user_id', 'session_id', 'event_type', 'timestamp']].head(10))All lessons in this course
- Sessionisation and Event Sequencing
- Funnel Analysis
- Cohort Retention Table
- Visualising User Journeys