LangChain / RAG / Vector DBs · Lesson

Quantization and Compression of Vectors

Shrink vector storage and speed up search with scalar and product quantization while controlling accuracy loss.

The Memory Problem

A million 1536-dimension vectors stored as 32-bit floats need about 6 GB of RAM. Quantization compresses vectors so they fit in far less memory and search faster.

Float32 Baseline

By default each dimension is a 4-byte float. Storage equals vectors x dims x 4 bytes. Reducing the bytes per dimension is the path to compression.

vectors = 1_000_000
dims = 1536
bytes_total = vectors * dims * 4
print(bytes_total / 1e9, "GB")  # ~6.14 GB

All lessons in this course

Vector DB Storage Architectures
Proximity Search Algorithms (HNSW, IVFFlat)
Vector DB Persistence and Scalability
Quantization and Compression of Vectors

← Back to LangChain / RAG / Vector DBs