ColumnTransformer Inside a Pipeline
Learners will nest ColumnTransformer for mixed numeric/categorical preprocessing inside a Pipeline so heterogeneous raw data enters directly without manual splitting.
The Mixed-Data Problem
Real-world tabular datasets almost always contain a mix of numeric columns (age, salary, temperature) and categorical columns (city, product category, gender). Each type needs different preprocessing: numeric columns need scaling or imputation, while categorical columns need encoding. ColumnTransformer lets you apply different transformers to different subsets of columns in parallel, producing a single clean feature matrix.
ColumnTransformer: The Basic Structure
ColumnTransformer takes a list of (name, transformer, columns) triples. The columns can be a list of column names, a list of integer indices, a boolean mask, or a sklearn selector like make_column_selector. After transformation, the results from all transformers are horizontally concatenated.
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
ct = ColumnTransformer([
('num', StandardScaler(), ['age', 'salary']),
('cat', OneHotEncoder(handle_unknown='ignore'), ['city', 'education'])
])
print('Transformers:', [name for name, _, _ in ct.transformers])