0Pricing
Pandas & NumPy Academy · Lesson

Filling Missing Values

Replace NaN with a constant, column mean, forward fill, or backward fill using fillna() and its method parameter.

Introduction to fillna()

Imputation means replacing missing values with a reasonable substitute rather than dropping the row. Pandas fillna() is the primary tool for this: it replaces every NaN in a Series or DataFrame with a value you specify. Unlike dropping, imputation preserves all rows, which is especially important when data is scarce or when the missing-data pattern is informative.

The simplest form passes a scalar: df['col'].fillna(0).

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'product': ['A', 'B', 'C', 'D'],
    'price': [10.0, np.nan, 30.0, np.nan]
})

# Fill all NaN in 'price' with 0
filled = df['price'].fillna(0)
print(filled)
# 0    10.0
# 1     0.0
# 2    30.0
# 3     0.0

Filling with a Column Mean or Median

Filling with the column mean (for symmetric distributions) or median (for skewed distributions) is one of the most common imputation strategies. These statistics represent the central tendency of the data, so they minimise the distortion of the column's distribution. Always compute the statistic on the training split to avoid data leakage.

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'salary': [50000, np.nan, 70000, 80000, np.nan, 60000]
})

mean_salary = df['salary'].mean()
median_salary = df['salary'].median()

df['salary_mean_filled'] = df['salary'].fillna(mean_salary)
df['salary_median_filled'] = df['salary'].fillna(median_salary)
print(df)
#     salary  salary_mean_filled  salary_median_filled
# 0  50000.0             50000.0               50000.0
# 1      NaN             65000.0               65000.0
# 2  70000.0             70000.0               70000.0

All lessons in this course

  1. Detecting Missing Values
  2. Dropping Missing Values
  3. Filling Missing Values
  4. Interpolation and Advanced Imputation
← Back to Pandas & NumPy Academy