Quantization and Compression of Vectors
Shrink vector storage and speed up search with scalar and product quantization while controlling accuracy loss.
The Memory Problem
A million 1536-dimension vectors stored as 32-bit floats need about 6 GB of RAM. Quantization compresses vectors so they fit in far less memory and search faster.
Float32 Baseline
By default each dimension is a 4-byte float. Storage equals vectors x dims x 4 bytes. Reducing the bytes per dimension is the path to compression.
vectors = 1_000_000
dims = 1536
bytes_total = vectors * dims * 4
print(bytes_total / 1e9, "GB") # ~6.14 GBAll lessons in this course
- Vector DB Storage Architectures
- Proximity Search Algorithms (HNSW, IVFFlat)
- Vector DB Persistence and Scalability
- Quantization and Compression of Vectors