Projecting Data and Reconstructing from Components
Learners will transform a dataset to the principal-component space, visualise the 2D projection, and reconstruct the original features to quantify information loss.
Projection: From High-D to Low-D
After PCA finds the principal components, projection transforms each data point into the new component space. The projected coordinates are called scores. If you keep only 2 components from 64 original features, each 64-dimensional point becomes a 2-dimensional score. This is achieved by multiplying the centred data matrix by the matrix of eigenvectors (the loadings matrix).
The transform Method in sklearn
In scikit-learn, pca.fit(X) learns the components and pca.transform(X) projects the data. The convenience method pca.fit_transform(X) does both in one call. The result is a matrix of shape (n_samples, n_components) — each row is a point in the reduced space.
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import load_digits
X, y = load_digits(return_X_y=True) # 1797 x 64
X_scaled = StandardScaler().fit_transform(X)
pca = PCA(n_components=10)
X_reduced = pca.fit_transform(X_scaled)
print('Original shape:', X_scaled.shape)
print('Reduced shape:', X_reduced.shape)
print('Variance retained:', pca.explained_variance_ratio_.sum().round(4))All lessons in this course
- PCA: Variance, Eigenvectors, and Principal Components
- Projecting Data and Reconstructing from Components
- t-SNE: Neighbourhood Preservation for Visualisation
- PCA as Preprocessing: Speed and Noise Reduction in Pipelines