Interpolation and Advanced Imputation
Use interpolate() for smooth time-based filling and understand when mean vs. median imputation is appropriate.
When Interpolation Beats fillna()
Forward fill and backward fill carry the nearest known value without considering the trend of the data. Interpolation estimates missing values by assuming a smooth transition between known values — for example, if temperature was 20°C on Monday and 30°C on Friday, interpolation estimates 25°C for Wednesday. This produces more realistic imputations for smoothly changing signals like sensor data, prices, or population counts.
import pandas as pd
import numpy as np
df = pd.DataFrame({
'day': [1, 2, 3, 4, 5],
'temp': [20.0, np.nan, np.nan, np.nan, 30.0]
})
# Linear interpolation fills gaps smoothly
df['temp_interp'] = df['temp'].interpolate(method='linear')
print(df)
# day temp temp_interp
# 0 1 20.0 20.0
# 1 2 NaN 22.5
# 2 3 NaN 25.0
# 3 4 NaN 27.5
# 4 5 30.0 30.0interpolate() Methods
Pandas interpolate() supports multiple methods. The most common are 'linear' (equally spaced between known values), 'time' (accounts for unequal time gaps when a DatetimeIndex is set), 'polynomial' (fits a polynomial curve, requires an order argument), and 'spline' (smooth piecewise polynomial). Linear is the safest default; higher-order methods can overfit small gaps.
import pandas as pd
import numpy as np
s = pd.Series([0, np.nan, np.nan, 8.0])
print('linear:', s.interpolate('linear').tolist())
# [0.0, 2.6666..., 5.3333..., 8.0]
print('polynomial(2):', s.interpolate('polynomial', order=2).tolist())
# [0.0, 2.222..., 5.111..., 8.0]All lessons in this course
- Detecting Missing Values
- Dropping Missing Values
- Filling Missing Values
- Interpolation and Advanced Imputation