Project Setup and Data Ingestion
Define project goals, load data from a CSV and a SQLite database, merge sources, and perform a full audit of the combined dataset.
The Capstone Project Goal
In this capstone project, you bring together every skill from the course — NumPy, Pandas, visualisation, statistical testing, database connectivity, and pipeline design — in a single end-to-end workflow. The project simulates a real analyst task: ingest raw data from two sources (a CSV file and a database table), merge them, audit data quality, compute advanced KPIs, visualise trends, and export a polished report. This mirrors what professional data analysts do every day in industry.
Defining Project Goals and KPIs
Before writing a single line of code, define your project goals and the Key Performance Indicators (KPIs) you will compute. Document: What business question are you answering? What data sources do you have? What output formats are needed? For this capstone: Analyse monthly revenue trends across product categories, compute cohort retention, identify top regions by profit margin, and export a report with visualisations. A clear goal prevents scope creep and keeps the pipeline focused.
# Project configuration — define goals upfront
CONFIG = {
'csv_path': 'data/orders_2024.csv',
'db_url': 'sqlite:///customer_db.sqlite',
'db_table': 'customers',
'output_dir': 'output/',
'report_path': 'output/report.md',
'analysis_year': 2024,
'top_n_regions': 5,
'rolling_window_days': 30
}
print('Project config loaded.')
print('Target KPIs: monthly revenue, cohort retention, top regions by margin')All lessons in this course
- Project Setup and Data Ingestion
- Data Cleaning and Feature Engineering
- Analysis and KPI Computation
- Final Visualisation and Report Export