Date and Time Feature Extraction
Learners will decompose datetime columns into year, month, day-of-week, hour, and cyclical sine/cosine encodings that capture periodicity.
Why Raw Timestamps Are Useless
A raw Unix timestamp like 1718784000 or a datetime string like '2024-06-19 14:30:00' contains rich information — the hour of day, day of week, month of year — but a model cannot access that information from the raw value. A linear model sees the timestamp as a single large integer and can only learn that later timestamps predict higher/lower targets, missing all periodic patterns. Feature extraction decomposes the timestamp into meaningful numeric components that models can directly use.
Parsing Datetime Columns with Pandas
Always parse date/time strings to pd.Timestamp or datetime64 dtype using pd.to_datetime() before extracting features. Once parsed, Pandas provides a .dt accessor with dozens of attributes: .dt.year, .dt.month, .dt.day, .dt.hour, .dt.minute, .dt.dayofweek (0=Monday), .dt.dayofyear, .dt.quarter, .dt.is_weekend, and more. Each attribute becomes a new integer column in your feature matrix.
import pandas as pd
dates = pd.Series(['2024-01-15 08:30:00', '2024-06-21 17:45:00', '2024-12-25 12:00:00'])
dts = pd.to_datetime(dates)
df = pd.DataFrame({
'year': dts.dt.year,
'month': dts.dt.month,
'day': dts.dt.day,
'hour': dts.dt.hour,
'dayofweek': dts.dt.dayofweek, # 0=Mon, 6=Sun
'quarter': dts.dt.quarter,
'is_weekend': (dts.dt.dayofweek >= 5).astype(int)
})
print(df.to_string())