Pandas & NumPy Academy · Lesson

Descriptive Stats and Normality Testing

Compute skewness and kurtosis with scipy.stats, run a Shapiro-Wilk test for normality, and interpret the p-value.

Descriptive vs Inferential Statistics

Descriptive statistics summarise what is in your data: mean, median, standard deviation, skewness. Inferential statistics make claims about a population based on a sample: hypothesis tests, confidence intervals, p-values. SciPy's scipy.stats module bridges these worlds — it provides both descriptive measures that go beyond Pandas' describe() and the full suite of classical hypothesis tests. Combining Pandas for data manipulation and SciPy for statistical testing is the standard Python data science workflow.

Extended Descriptive Statistics

Pandas describe() gives count, mean, std, min/max, and quartiles. scipy.stats adds skewness (asymmetry of the distribution) and kurtosis (tail heaviness). Skewness of 0 means symmetric; positive skewness means a longer right tail (many small values, few very large). Kurtosis of 3 (or 0 in excess form) is normal; higher values indicate heavier tails with more extreme outliers than a normal distribution would produce.

import pandas as pd
from scipy import stats
import numpy as np

np.random.seed(0)
data = np.concatenate([
    np.random.exponential(scale=2, size=500),  # right-skewed
    np.random.normal(loc=5, scale=1, size=500)
])

print('Mean:', round(data.mean(), 3))
print('Std:', round(data.std(), 3))
print('Skewness:', round(stats.skew(data), 3))
print('Kurtosis (excess):', round(stats.kurtosis(data), 3))

All lessons in this course

← Back to Pandas & NumPy Academy