t-SNE: Neighbourhood Preservation for Visualisation
Learners will apply t-SNE with different perplexity settings to MNIST embeddings and understand that t-SNE distances are not meaningful for downstream modelling.
Beyond PCA: Non-Linear Visualisation
PCA projects data linearly and preserves global variance, but can fail to show local cluster structure. t-SNE (t-distributed Stochastic Neighbour Embedding) is a non-linear dimensionality reduction technique designed specifically for 2D and 3D visualisation. It prioritises preserving local neighbourhoods: points that are close in high-dimensional space should also be close in the 2D plot.
The Core Idea: Similarity Distributions
t-SNE defines a probability distribution over pairs of points in high-dimensional space: nearby points have high similarity. It then defines a similar distribution in the low-dimensional embedding. The algorithm minimises the KL divergence between the two distributions using gradient descent, nudging points in 2D until the neighbourhood structure matches the high-D structure.
All lessons in this course
- PCA: Variance, Eigenvectors, and Principal Components
- Projecting Data and Reconstructing from Components
- t-SNE: Neighbourhood Preservation for Visualisation
- PCA as Preprocessing: Speed and Noise Reduction in Pipelines