Bivariate and Correlation Analysis
Explore relationships between pairs of columns using scatter plots, box plots, and a correlation heatmap.
From One Variable to Two
Bivariate analysis examines the relationship between exactly two columns at a time. After profiling each column individually (univariate analysis), you ask: how do these two variables relate? The analysis method depends on the combination of variable types: numeric vs. numeric (scatter plot + correlation), categorical vs. numeric (box/violin plot), or categorical vs. categorical (crosstab + chi-squared). Each combination requires a different technique.
import pandas as pd
import seaborn as sns
df = sns.load_dataset('titanic')
# Relationship types present in the dataset
print('Numeric columns:', df.select_dtypes('number').columns.tolist())
print('Categorical columns:', df.select_dtypes('object').columns.tolist())
print('Boolean columns:', df.select_dtypes('bool').columns.tolist())Numeric vs. Numeric: Scatter Plot
For two numeric variables, a scatter plot is the primary bivariate tool. It shows each observation as a point at (x, y) coordinates, revealing the direction, strength, and form of the relationship. Always look at the plot before computing a correlation coefficient — a correlation of 0 can still show a strong non-linear (curved) relationship that the coefficient misses.
import seaborn as sns
import matplotlib.pyplot as plt
df = sns.load_dataset('titanic')
sns.scatterplot(data=df, x='age', y='fare', alpha=0.4)
plt.title('Age vs. Fare — Is There a Linear Relationship?')
plt.xlabel('Age')
plt.ylabel('Fare ($)')
plt.show()All lessons in this course
- Dataset Profiling Checklist
- Univariate Analysis
- Bivariate and Correlation Analysis
- Summarising Findings in a Report