Univariate Analysis
Analyse each column independently: plot distributions for numerics and value counts for categoricals, note outliers and skewness.
What Is Univariate Analysis?
Univariate analysis examines one column at a time in isolation, ignoring relationships between columns. The goal is to understand each variable's distribution, detect outliers, identify skewness, and spot data quality issues before any modelling begins. Univariate analysis differs for numeric and categorical columns — numerics need distribution plots and summary statistics, while categoricals need frequency counts and proportions.
import pandas as pd
import seaborn as sns
df = sns.load_dataset('titanic')
numeric_cols = df.select_dtypes('number').columns.tolist()
categoric_cols = df.select_dtypes('object').columns.tolist()
print('Numeric columns:', numeric_cols)
print('Categorical columns:', categoric_cols)Histograms for Numeric Columns
Plot a histogram for each numeric column to see the shape of its distribution. Look for symmetry vs. skewness (tail on one side), modality (one peak vs. several), and obvious anomalies (values at extreme ends that seem implausible). In Pandas, df.hist() creates a quick multi-column histogram grid without writing a loop, but Seaborn's histplot gives finer control for individual columns.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df = sns.load_dataset('titanic')
# Quick multi-column histogram grid
numeric = df.select_dtypes('number')
numeric.hist(bins=20, figsize=(12, 8), layout=(2, 3), color='steelblue', edgecolor='white')
plt.suptitle('Univariate Distributions (Numeric Columns)', y=1.02)
plt.tight_layout()
plt.show()