Pandas & NumPy Academy · Lesson

Dataset Profiling Checklist

Systematically inspect shape, dtypes, missing counts, unique values, and basic statistics when encountering a new dataset.

The First Five Minutes with a Dataset

Experienced data analysts follow a systematic profiling checklist whenever they encounter a new dataset. Rather than diving straight into analysis, they first answer a set of diagnostic questions: How large is the data? What types are the columns? How many values are missing? Are there obvious anomalies? This structured approach prevents hours of wasted work caused by misunderstood data types or hidden nulls corrupting calculations.

import pandas as pd

# Simulate loading a new dataset
df = pd.read_csv('https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv')

# Step 1: size
print('Shape:', df.shape)
print('Rows:', len(df))

Step 1: Shape, Columns, and dtypes

The first three checks are always shape (rows × columns), column names (do they match expectations?), and dtypes (are numeric columns actually numeric?). A common surprise is that numeric columns were loaded as object dtype because they contain text entries like 'unknown' or mixed comma-formatted numbers. Catching this at the profiling stage saves you from aggregations that silently return wrong results.

import pandas as pd
import seaborn as sns

df = sns.load_dataset('titanic')

print('Shape:', df.shape)
print('\nColumns:', df.columns.tolist())
print('\nData Types:')
print(df.dtypes)

All lessons in this course

← Back to Pandas & NumPy Academy