Soft Margin SVM and the C Parameter
Learners will observe how increasing C reduces margin width and penalises misclassification, while small C allows more violations for a wider, more robust margin.
The Problem with Hard Margins
The hard-margin SVM requires that every training example be correctly classified with a margin of at least 1, leaving no room for error. In practice, real datasets are almost never perfectly linearly separable. Noise, mislabelled examples, and genuine class overlap mean that a hard margin is either impossible to satisfy or results in a boundary so contorted to avoid every violation that it overfits the training data. We need a principled way to allow some mistakes while still maximising the margin.
Slack Variables: Allowing Violations
The soft-margin SVM introduces slack variables ξᵢ ≥ 0 (xi, pronounced 'ksi'), one per training example, that measure how much a point violates the margin. If ξᵢ = 0, the point is correctly classified outside the margin. If 0 < ξᵢ < 1, the point is inside the margin but correctly classified. If ξᵢ > 1, the point is misclassified. The new objective minimises ||w||²/2 + C × Σξᵢ, balancing margin width against total violation.