PCA: Variance, Eigenvectors, and Principal Components
Learners will fit PCA to a high-dimensional dataset, inspect explained variance ratios, and choose the number of components that retain 95% of total variance.
The Problem with High-Dimensional Data
As feature count grows, datasets become increasingly sparse — the curse of dimensionality. Many features are redundant or correlated, carrying overlapping information. Principal Component Analysis (PCA) solves this by finding a new, smaller set of axes (principal components) that capture the maximum variance in the data with the fewest dimensions.
Variance: What PCA Maximises
PCA seeks directions in feature space along which the data varies the most. A direction with high variance captures rich information; a direction with near-zero variance is essentially noise. The first principal component (PC1) is the direction of maximum variance, PC2 is orthogonal to PC1 with the next highest variance, and so on.
All lessons in this course
- PCA: Variance, Eigenvectors, and Principal Components
- Projecting Data and Reconstructing from Components
- t-SNE: Neighbourhood Preservation for Visualisation
- PCA as Preprocessing: Speed and Noise Reduction in Pipelines