0Pricing
Machine Learning Academy · Lesson

Encoding Categorical Variables: OrdinalEncoder and OneHotEncoder

Learners will convert ordinal categories to integers and nominal categories to one-hot vectors, handling unknown categories at inference time.

Why Categorical Encoding Is Required

Machine learning models operate on numbers, not strings. A column containing values like 'red', 'green', 'blue' cannot be fed directly into a scikit-learn estimator. You must convert categories to numeric representations before training. The choice of encoding method matters greatly: a poor encoding can introduce spurious ordinal relationships or explode dimensionality. This lesson covers the two most important encoders: OrdinalEncoder for ordered categories and OneHotEncoder for unordered ones.

import pandas as pd

df = pd.DataFrame({
    'size': ['small', 'medium', 'large', 'medium'],
    'color': ['red', 'blue', 'green', 'red'],
    'price': [10.5, 20.0, 15.0, 18.5]
})

print(df.dtypes)
# size and color are 'object' (string) — must be encoded before modelling

Ordinal vs Nominal Categories

Not all categorical variables are equal. Ordinal categories have a meaningful order: small < medium < large, or bad < fair < good < excellent. Nominal categories have no intrinsic order: red, blue, green are just labels. Choosing the wrong encoding creates false relationships — for example, integer-encoding unordered colors as 0, 1, 2 tells the model that blue (1) is somehow between red (0) and green (2), which is meaningless. Always identify whether a variable is ordinal or nominal before encoding.

# Ordinal: clear ordering
ordinal_example = ['low', 'medium', 'high', 'very high']

# Nominal: no meaningful ordering
nominal_example = ['cat', 'dog', 'bird']

# Wrong approach for nominal: integer encoding implies order
# 0=cat, 1=dog, 2=bird -- model thinks dog is between cat and bird

# Correct approach for nominal: one-hot encoding
# cat -> [1, 0, 0]
# dog -> [0, 1, 0]
# bird -> [0, 0, 1]

All lessons in this course

  1. Handling Missing Values: Drop, Impute, and Flag
  2. Feature Scaling: StandardScaler and MinMaxScaler
  3. Encoding Categorical Variables: OrdinalEncoder and OneHotEncoder
  4. Combining Steps with ColumnTransformer
← Back to Machine Learning Academy