Machine Learning Academy · Lesson

PCA: Variance, Eigenvectors, and Principal Components

Learners will fit PCA to a high-dimensional dataset, inspect explained variance ratios, and choose the number of components that retain 95% of total variance.

The Problem with High-Dimensional Data

As feature count grows, datasets become increasingly sparse — the curse of dimensionality. Many features are redundant or correlated, carrying overlapping information. Principal Component Analysis (PCA) solves this by finding a new, smaller set of axes (principal components) that capture the maximum variance in the data with the fewest dimensions.

Variance: What PCA Maximises

PCA seeks directions in feature space along which the data varies the most. A direction with high variance captures rich information; a direction with near-zero variance is essentially noise. The first principal component (PC1) is the direction of maximum variance, PC2 is orthogonal to PC1 with the next highest variance, and so on.

All lessons in this course

PCA: Variance, Eigenvectors, and Principal Components
Projecting Data and Reconstructing from Components
t-SNE: Neighbourhood Preservation for Visualisation
PCA as Preprocessing: Speed and Noise Reduction in Pipelines

← Back to Machine Learning Academy