0Pricing
Machine Learning Academy · Lesson

Date and Time Feature Extraction

Learners will decompose datetime columns into year, month, day-of-week, hour, and cyclical sine/cosine encodings that capture periodicity.

Why Raw Timestamps Are Useless

A raw Unix timestamp like 1718784000 or a datetime string like '2024-06-19 14:30:00' contains rich information — the hour of day, day of week, month of year — but a model cannot access that information from the raw value. A linear model sees the timestamp as a single large integer and can only learn that later timestamps predict higher/lower targets, missing all periodic patterns. Feature extraction decomposes the timestamp into meaningful numeric components that models can directly use.

Parsing Datetime Columns with Pandas

Always parse date/time strings to pd.Timestamp or datetime64 dtype using pd.to_datetime() before extracting features. Once parsed, Pandas provides a .dt accessor with dozens of attributes: .dt.year, .dt.month, .dt.day, .dt.hour, .dt.minute, .dt.dayofweek (0=Monday), .dt.dayofyear, .dt.quarter, .dt.is_weekend, and more. Each attribute becomes a new integer column in your feature matrix.

import pandas as pd

dates = pd.Series(['2024-01-15 08:30:00', '2024-06-21 17:45:00', '2024-12-25 12:00:00'])
dts = pd.to_datetime(dates)

df = pd.DataFrame({
    'year': dts.dt.year,
    'month': dts.dt.month,
    'day': dts.dt.day,
    'hour': dts.dt.hour,
    'dayofweek': dts.dt.dayofweek,  # 0=Mon, 6=Sun
    'quarter': dts.dt.quarter,
    'is_weekend': (dts.dt.dayofweek >= 5).astype(int)
})
print(df.to_string())

All lessons in this course

  1. Creating New Features: Log Transforms, Binning, and Interactions
  2. Date and Time Feature Extraction
  3. Feature Selection: Variance Threshold and SelectKBest
  4. Recursive Feature Elimination with Cross-Validation
← Back to Machine Learning Academy