Filling Missing Values
Replace NaN with a constant, column mean, forward fill, or backward fill using fillna() and its method parameter.
Introduction to fillna()
Imputation means replacing missing values with a reasonable substitute rather than dropping the row. Pandas fillna() is the primary tool for this: it replaces every NaN in a Series or DataFrame with a value you specify. Unlike dropping, imputation preserves all rows, which is especially important when data is scarce or when the missing-data pattern is informative.
The simplest form passes a scalar: df['col'].fillna(0).
import pandas as pd
import numpy as np
df = pd.DataFrame({
'product': ['A', 'B', 'C', 'D'],
'price': [10.0, np.nan, 30.0, np.nan]
})
# Fill all NaN in 'price' with 0
filled = df['price'].fillna(0)
print(filled)
# 0 10.0
# 1 0.0
# 2 30.0
# 3 0.0Filling with a Column Mean or Median
Filling with the column mean (for symmetric distributions) or median (for skewed distributions) is one of the most common imputation strategies. These statistics represent the central tendency of the data, so they minimise the distortion of the column's distribution. Always compute the statistic on the training split to avoid data leakage.
import pandas as pd
import numpy as np
df = pd.DataFrame({
'salary': [50000, np.nan, 70000, 80000, np.nan, 60000]
})
mean_salary = df['salary'].mean()
median_salary = df['salary'].median()
df['salary_mean_filled'] = df['salary'].fillna(mean_salary)
df['salary_median_filled'] = df['salary'].fillna(median_salary)
print(df)
# salary salary_mean_filled salary_median_filled
# 0 50000.0 50000.0 50000.0
# 1 NaN 65000.0 65000.0
# 2 70000.0 70000.0 70000.0All lessons in this course
- Detecting Missing Values
- Dropping Missing Values
- Filling Missing Values
- Interpolation and Advanced Imputation