Pandas & NumPy Academy · Lesson

Interpolation and Advanced Imputation

Use interpolate() for smooth time-based filling and understand when mean vs. median imputation is appropriate.

When Interpolation Beats fillna()

Forward fill and backward fill carry the nearest known value without considering the trend of the data. Interpolation estimates missing values by assuming a smooth transition between known values — for example, if temperature was 20°C on Monday and 30°C on Friday, interpolation estimates 25°C for Wednesday. This produces more realistic imputations for smoothly changing signals like sensor data, prices, or population counts.

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'day': [1, 2, 3, 4, 5],
    'temp': [20.0, np.nan, np.nan, np.nan, 30.0]
})

# Linear interpolation fills gaps smoothly
df['temp_interp'] = df['temp'].interpolate(method='linear')
print(df)
#    day  temp  temp_interp
# 0    1  20.0         20.0
# 1    2   NaN         22.5
# 2    3   NaN         25.0
# 3    4   NaN         27.5
# 4    5  30.0         30.0

interpolate() Methods

Pandas interpolate() supports multiple methods. The most common are 'linear' (equally spaced between known values), 'time' (accounts for unequal time gaps when a DatetimeIndex is set), 'polynomial' (fits a polynomial curve, requires an order argument), and 'spline' (smooth piecewise polynomial). Linear is the safest default; higher-order methods can overfit small gaps.

import pandas as pd
import numpy as np

s = pd.Series([0, np.nan, np.nan, 8.0])

print('linear:', s.interpolate('linear').tolist())
# [0.0, 2.6666..., 5.3333..., 8.0]

print('polynomial(2):', s.interpolate('polynomial', order=2).tolist())
# [0.0, 2.222..., 5.111..., 8.0]

All lessons in this course

← Back to Pandas & NumPy Academy