Machine Learning Academy · Lesson

t-SNE: Neighbourhood Preservation for Visualisation

Learners will apply t-SNE with different perplexity settings to MNIST embeddings and understand that t-SNE distances are not meaningful for downstream modelling.

Beyond PCA: Non-Linear Visualisation

PCA projects data linearly and preserves global variance, but can fail to show local cluster structure. t-SNE (t-distributed Stochastic Neighbour Embedding) is a non-linear dimensionality reduction technique designed specifically for 2D and 3D visualisation. It prioritises preserving local neighbourhoods: points that are close in high-dimensional space should also be close in the 2D plot.

The Core Idea: Similarity Distributions

t-SNE defines a probability distribution over pairs of points in high-dimensional space: nearby points have high similarity. It then defines a similar distribution in the low-dimensional embedding. The algorithm minimises the KL divergence between the two distributions using gradient descent, nudging points in 2D until the neighbourhood structure matches the high-D structure.

All lessons in this course

PCA: Variance, Eigenvectors, and Principal Components
Projecting Data and Reconstructing from Components
t-SNE: Neighbourhood Preservation for Visualisation
PCA as Preprocessing: Speed and Noise Reduction in Pipelines

← Back to Machine Learning Academy