0PricingLogin
Pandas & NumPy Academy · Lesson

Project Setup and Data Ingestion

Define project goals, load data from a CSV and a SQLite database, merge sources, and perform a full audit of the combined dataset.

The Capstone Project Goal

In this capstone project, you bring together every skill from the course — NumPy, Pandas, visualisation, statistical testing, database connectivity, and pipeline design — in a single end-to-end workflow. The project simulates a real analyst task: ingest raw data from two sources (a CSV file and a database table), merge them, audit data quality, compute advanced KPIs, visualise trends, and export a polished report. This mirrors what professional data analysts do every day in industry.

Defining Project Goals and KPIs

Before writing a single line of code, define your project goals and the Key Performance Indicators (KPIs) you will compute. Document: What business question are you answering? What data sources do you have? What output formats are needed? For this capstone: Analyse monthly revenue trends across product categories, compute cohort retention, identify top regions by profit margin, and export a report with visualisations. A clear goal prevents scope creep and keeps the pipeline focused.

# Project configuration — define goals upfront
CONFIG = {
    'csv_path': 'data/orders_2024.csv',
    'db_url': 'sqlite:///customer_db.sqlite',
    'db_table': 'customers',
    'output_dir': 'output/',
    'report_path': 'output/report.md',
    'analysis_year': 2024,
    'top_n_regions': 5,
    'rolling_window_days': 30
}

print('Project config loaded.')
print('Target KPIs: monthly revenue, cohort retention, top regions by margin')

All lessons in this course

  1. Project Setup and Data Ingestion
  2. Data Cleaning and Feature Engineering
  3. Analysis and KPI Computation
  4. Final Visualisation and Report Export
← Back to Pandas & NumPy Academy