Shifting and Lag Features
Create lag and lead columns with shift(), compute period-over-period change with diff(), and calculate percentage change.
Why Lag Features Matter
In time series analysis, the value at a previous time step (a lag) is often one of the best predictors of the current value. For example, yesterday's sales are informative about today's sales. Creating lag columns lets you use historical values as features in machine learning models or in statistical analysis of autocorrelation. Pandas provides shift() to create lag features in a single vectorised call.
shift(): Moving Values Forward or Backward
Series.shift(n) moves all values down by n positions (positive n creates a lag), filling the first n rows with NaN. Passing a negative n moves values up (creates a lead, shifting future values back to the current row). The index remains unchanged — only the values are displaced.
import pandas as pd
import numpy as np
df = pd.DataFrame(
{'sales': [100, 120, 115, 130, 140, 125]},
index=pd.date_range('2024-01-01', periods=6, freq='D')
)
# Lag-1: yesterday's sales
df['sales_lag1'] = df['sales'].shift(1)
# Lead-1: tomorrow's sales (shifted up)
df['sales_lead1'] = df['sales'].shift(-1)
print(df)All lessons in this course
- DatetimeIndex and Period Ranges
- Resampling Time Series
- Shifting and Lag Features
- Extracting Temporal Features