0PricingLogin
Pandas & NumPy Academy · Lesson

Chi-Squared Test for Independence

Test whether two categorical variables are independent using chi2_contingency on a crosstab frequency table.

Testing Categorical Relationships

The chi-squared test for independence tests whether two categorical variables are statistically independent or whether there is an association between them. For example: 'Is customer churn independent of subscription tier?' or 'Is product preference independent of age group?' Unlike t-tests that compare numeric means, chi-squared tests compare observed frequencies in a contingency table to the frequencies we would expect if the variables were independent.

Building a Contingency Table with Pandas

A contingency table (also called a cross-tabulation) shows the count of observations for each combination of two categorical variables. pd.crosstab(df['var1'], df['var2']) builds this table directly from a DataFrame. Each cell contains the count of observations where row category and column category co-occur. This is the input to scipy.stats.chi2_contingency().

import pandas as pd
import numpy as np

np.random.seed(0)
df = pd.DataFrame({
    'subscription': np.random.choice(['free', 'basic', 'pro'], 300),
    'churned': np.random.choice(['yes', 'no'], 300, p=[0.3, 0.7])
})

# Build contingency table
ct = pd.crosstab(df['subscription'], df['churned'])
print(ct)
print()
print('Row totals:', ct.sum(axis=1).to_dict())

All lessons in this course

  1. Descriptive Stats and Normality Testing
  2. T-Tests for Comparing Means
  3. Chi-Squared Test for Independence
  4. ANOVA and Post-Hoc Tests
← Back to Pandas & NumPy Academy