Recursive Feature Elimination with Cross-Validation
Learners will use RFECV to let the model itself vote out the least useful features while preserving the cross-validation loop to prevent leakage.
Limitations of Univariate Selection
Univariate feature selection methods like SelectKBest evaluate each feature independently. They miss important cases: a feature that is useless alone but highly valuable combined with another, or two features that are individually strong but highly redundant when combined. Recursive Feature Elimination (RFE) overcomes this by using the model itself to judge feature importance: it fits the model with all features, removes the least important one, refits, removes again, and repeats. This captures how features interact within the model.
How RFE Works
RFE works as follows: (1) train the model on all features; (2) rank features by importance (coefficient magnitude for linear models, feature_importances_ for trees); (3) remove the lowest-ranked feature; (4) repeat until the desired number of features remains. At each step, the ranking is recomputed using the model fitted on the remaining features. This means features that appear weak in the presence of redundant competitors might become important once those competitors are removed.
from sklearn.feature_selection import RFE
from sklearn.svm import SVR
from sklearn.datasets import load_diabetes
import numpy as np
X, y = load_diabetes(return_X_y=True)
feature_names = load_diabetes().feature_names
# Select top 5 features
rfe = RFE(estimator=SVR(kernel='linear'), n_features_to_select=5)
rfe.fit(X, y)
print('Selected features:', [feature_names[i] for i in range(len(feature_names)) if rfe.support_[i]])
print('Feature rankings:', rfe.ranking_) # 1 = selectedAll lessons in this course
- Creating New Features: Log Transforms, Binning, and Interactions
- Date and Time Feature Extraction
- Feature Selection: Variance Threshold and SelectKBest
- Recursive Feature Elimination with Cross-Validation