Machine Learning Academy · Lesson

Soft Margin SVM and the C Parameter

Learners will observe how increasing C reduces margin width and penalises misclassification, while small C allows more violations for a wider, more robust margin.

The Problem with Hard Margins

The hard-margin SVM requires that every training example be correctly classified with a margin of at least 1, leaving no room for error. In practice, real datasets are almost never perfectly linearly separable. Noise, mislabelled examples, and genuine class overlap mean that a hard margin is either impossible to satisfy or results in a boundary so contorted to avoid every violation that it overfits the training data. We need a principled way to allow some mistakes while still maximising the margin.

Slack Variables: Allowing Violations

The soft-margin SVM introduces slack variables ξᵢ ≥ 0 (xi, pronounced 'ksi'), one per training example, that measure how much a point violates the margin. If ξᵢ = 0, the point is correctly classified outside the margin. If 0 < ξᵢ < 1, the point is inside the margin but correctly classified. If ξᵢ > 1, the point is misclassified. The new objective minimises ||w||²/2 + C × Σξᵢ, balancing margin width against total violation.

All lessons in this course

Maximum Margin Classifier: Support Vectors and Hyperplane
Soft Margin SVM and the C Parameter
The Kernel Trick: RBF, Polynomial, and Sigmoid Kernels
Tuning C and Gamma with a Grid Search

← Back to Machine Learning Academy