Building Preprocessing Pipelines
sklearn Pipeline, ColumnTransformer, combining transformers for clean preprocessing workflows.
Why Pipelines?
A scikit-learn Pipeline chains preprocessing steps and a model into one object. It guarantees the same transformations apply to train and test data, preventing leakage and messy code.
A Basic Pipeline
Pipeline takes a list of (name, step) tuples. Calling fit runs each step in order; the final step is usually an estimator.
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
pipe = Pipeline([
("scale", StandardScaler()),
("model", LogisticRegression()),
])All lessons in this course
- Outlier Detection and Removal
- Encoding Categorical Variables
- Feature Scaling: Normalization and Standardization
- Building Preprocessing Pipelines