Choosing K: Elbow Method and Silhouette Score
Learners will plot inertia vs k (elbow) and compute silhouette coefficients to pick the number of clusters that gives well-separated, compact groups.
Why Choosing k Matters
K-Means requires you to specify k — the number of clusters — before training. Too few clusters and you lump distinct groups together; too many and you split natural groups artificially. There is no universally correct k, but two diagnostic tools — the elbow method and the silhouette score — give principled guidance.
Inertia Decreases as k Grows
As you increase k, inertia always decreases because points are assigned to closer centroids. At k=n (one cluster per point), inertia is zero. This means you cannot simply minimise inertia — you need to find where additional clusters stop providing meaningful reductions. That point of diminishing returns is the elbow.
from sklearn.cluster import KMeans
import numpy as np
X = np.random.randn(200, 2)
inertias = []
for k in range(1, 11):
km = KMeans(n_clusters=k, random_state=42, n_init=10)
km.fit(X)
inertias.append(km.inertia_)
print('Inertia per k:')
for k, inr in enumerate(inertias, start=1):
print(f' k={k}: {inr:.1f}')All lessons in this course
- K-Means: Centroids, Assignment, and Update Steps
- Choosing K: Elbow Method and Silhouette Score
- DBSCAN: Core Points, Border Points, and Noise
- Clustering for Customer Segmentation: End-to-End Example