Voting Ensembles: Hard Vote vs Soft Vote
Learners will combine a KNN, logistic regression, and decision tree into a VotingClassifier and compare hard-vote majority with soft-vote probability averaging.
What Is a Voting Ensemble?
A Voting Ensemble combines the predictions of several different model types — such as a logistic regression, a k-nearest neighbours classifier, and a decision tree — to produce a single final prediction. Unlike bagging which uses many copies of the same model type, a voting ensemble leverages diversity in model architecture. Because different models make different kinds of errors, their combination can outperform any individual member, especially on datasets where no single algorithm dominates.
Hard Voting: Majority Rules
In hard voting, each model votes for a class label and the class that receives the most votes wins. If three models predict [cat, cat, dog], the ensemble predicts cat. Hard voting is simple and interpretable, but it treats all models as equally reliable and ignores confidence levels. A model that is barely 51% confident votes the same as one that is 99% confident, which can lead to suboptimal decisions when model confidences differ greatly.
from sklearn.ensemble import VotingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import cross_val_score
X, y = load_iris(return_X_y=True)
voter = VotingClassifier(
estimators=[
('lr', LogisticRegression(max_iter=1000)),
('knn', KNeighborsClassifier()),
('dt', DecisionTreeClassifier())
],
voting='hard'
)
scores = cross_val_score(voter, X, y, cv=5)
print('Hard Voting CV:', scores.mean().round(4))All lessons in this course
- Bootstrap Aggregation (Bagging) Explained
- Random Feature Selection: The Random Forest Trick
- Out-of-Bag Error: Free Validation Inside the Forest
- Voting Ensembles: Hard Vote vs Soft Vote