Loading and Auditing the Sales Dataset
Import a CSV of order records, audit dtypes and missing values, and fix date parsing and negative quantity anomalies.
Introducing the Sales Dataset
A typical sales dataset contains columns like order_id, order_date, customer_id, product, category, quantity, unit_price, and region. Before any analysis you must load this file and take a first look at its structure. The goal of this first step is to understand what data you have before writing a single formula.
import pandas as pd
df = pd.read_csv('sales.csv')
print(df.shape) # (rows, columns)
print(df.head())Checking Shape and Column Names
After loading, immediately check df.shape to know the dataset dimensions and df.columns to see all column names. This confirms the file loaded correctly and shows whether any column names have unexpected spaces or capitalisation that need cleaning. A mismatch in expected column count is an early warning sign of a corrupted file.
print('Rows, Cols:', df.shape)
print('Columns:', df.columns.tolist())
print('Index:', df.index[:5].tolist())All lessons in this course
- Loading and Auditing the Sales Dataset
- Revenue Calculations and Feature Engineering
- GroupBy Analysis by Region and Category
- Monthly Trend Visualisation