Pandas & NumPy Academy · Lesson

Project Setup and Data Ingestion

Define project goals, load data from a CSV and a SQLite database, merge sources, and perform a full audit of the combined dataset.

The Capstone Project Goal

In this capstone project, you bring together every skill from the course — NumPy, Pandas, visualisation, statistical testing, database connectivity, and pipeline design — in a single end-to-end workflow. The project simulates a real analyst task: ingest raw data from two sources (a CSV file and a database table), merge them, audit data quality, compute advanced KPIs, visualise trends, and export a polished report. This mirrors what professional data analysts do every day in industry.

Defining Project Goals and KPIs

Before writing a single line of code, define your project goals and the Key Performance Indicators (KPIs) you will compute. Document: What business question are you answering? What data sources do you have? What output formats are needed? For this capstone: Analyse monthly revenue trends across product categories, compute cohort retention, identify top regions by profit margin, and export a report with visualisations. A clear goal prevents scope creep and keeps the pipeline focused.

# Project configuration — define goals upfront
CONFIG = {
    'csv_path': 'data/orders_2024.csv',
    'db_url': 'sqlite:///customer_db.sqlite',
    'db_table': 'customers',
    'output_dir': 'output/',
    'report_path': 'output/report.md',
    'analysis_year': 2024,
    'top_n_regions': 5,
    'rolling_window_days': 30
}

print('Project config loaded.')
print('Target KPIs: monthly revenue, cohort retention, top regions by margin')

All lessons in this course

← Back to Pandas & NumPy Academy