Vector Database Part 2: Index

Vector search at scale

9 min readApr 7, 2024

This is the 7th article of building LLM-powered AI applications series. From the previous article, we introduced vector databases and overview of different index types. Let’s understand a bit more details of various index to understand difference.

Vector database indexing types

Nearest Neighbor Indexes for Similarity Search | Pinecone

One of the key components to efficient search is flexibility. And for that we have a wide range of search indexes…

www.pinecone.io

Flat

Description: kNN is referred to as flat in Faiss. No approximation or clustering, produce the most accurate results. query vector xq is compared against every other full-size vector to get distances, return the nearest k of xq.

When to use: Search quality is a very high priority. Search time does not matter OR when using a small index (<10K).

faiss.IndexFlatL2. The similarity is calculated as Euclidean distance.

faiss.IndexFlatIP. The similarity is calculated as an inner product.

Ideas to make vector search faster

Reduce vector size — through dimensionality reduction or reducing the number of bits representing our vectors values.
Reduce search scope — hashing (Locality Sensitive Hashing), clustering (Inverted File Index) or organizing vectors into tree structures based on certain attributes, similarity, or distance — and restricting our search to closest clusters or filter through most similar branches.

Locality Sensitive Hashing: faiss.IndexLSH. Best for low dimensional data, or small datasets

Inverted File Index: faiss.IndexIVFFlat. Good scalable option. High-quality, at reasonable speed and memory usage

Choosing the Right Vector Index for Your Project - Zilliz blog

Understanding in-memory vector search algorithms, indexing strategies, and guidelines on choosing the right vector…

zilliz.com

IVF (mapping from content to document location, in contrast to forward index, which maps from documents to content): assigns all database vectors to the partition with the closest centroid (determined using unsupervised clustering (typically k-means)). Generally a solid choice for small- to medium-size datasets.

Product quantization: high-dimensional vectors are mapped to low-dimensional quantized vectors assigning fixed-length chunks of the original vector to a single quantized value. Process involves splitting vectors, applying k-means clustering across all splits, and converting centroid indices.

Hierarchical Navigable Small World Graphs: faiss.IndexHNSWFlat. Very good for quality, high speed, but large memory usage

Approximate Nearest Neighbors Oh Yeah (Annoy): Generally not recommended. It partitions the vector space recursively to create a binary tree, where each node is split by a hyperplane equidistant from two randomly selected child vectors. The splitting process continues until leaf nodes have fewer than a predefined number of elements. Querying involves iteratively the tree to determine which side of the hyperplane the query vector falls on.

Decision on choosing which index: 100% recall or index_size < 10MB, use FLAT. 10MB < index_size < 2GB, use IVF. 2GB < index_size < 20GB, PQ allows you to use significantly less memory at the expense of low recall, while HNSW often gives you 95%+ recall at the expense of high memory usage. 20GB < index_size < 200GB, use composite indexes, IVF_PQ for memory-constrained applications and HNSW_SQ for applications that require high recall.

Locality Sensitive Hashing (LSH)

Shingling, MinHashing, and banded LSH (traditional approach)

Similarity Search, Part 5: Locality Sensitive Hashing (LSH)

Explore how similarity information can be incorporated into hash function

towardsdatascience.com

3-step traditional approach

Shingling: encoding original texts into vectors.
MinHashing: transforming vectors into a special representation called signature which can be used to compare similarity between them.
LSH function: hashing signature blocks into different buckets. If the signatures of a pair of vectors fall into the same bucket at least once, they are considered candidates.

The hashing function is interesting that it should have the property that the probability of collision (i.e., two items being hashed to the same value) is higher for items that are more similar according to your chosen metric, e.g. function to convert higher-dimension to lower-dimension where similar items are closer together. Key difference is that with dictionaries, our goal is to minimize the chances of multiple key-values being mapped to the same bucket — we minimize collisions. LSH is almost the opposite. In LSH, we want to maximize collisions — although ideally only for similar inputs.

Locality Sensitive Hashing (LSH): The Illustrated Guide | Pinecone

Locality sensitive hashing (LSH) is a widely popular technique used in approximate nearest neighbor (ANN) search. The…

www.pinecone.io

LSH has a very similar process to that used in Python dictionaries. We have a key-value pair which we feed into the dictionary. The key is processed through the dictionary hash function and mapped to a specific bucket.

Pinecone article is good for people who want to get deep into vector database, as it provides some native implementations for better understanding.

Random hyperplanes with dot-product and Hamming distance

Similarity Search, Part 6: Random Projections with LSH Forest

Understand how to hash data and reflect its similarity by constructing random hyperplanes

towardsdatascience.com

Random Projection for Locality Sensitive Hashing | Pinecone

Using the random projection method, we will reduce our highly-dimensional vectors into low-dimensionality binary…

www.pinecone.io

Reduce highly-dimensional vectors into low-dimensionality binary vectors with random hyperplane. Then use hamming distance to compute distance.

Inverted File Index

Similarity Search, Part 1: kNN & Inverted File Index

Similarity search is a popular problem where given a query Q we need to find the most similar documents to it among all…

towardsdatascience.com

One implementation is Voronoi diagrams, creating several non-intersecting regions to which each dataset point will belong. Each region has its own centroid which points to the center of that region. During query, distances to all the centroids of Voronoi partitions are calculated. Then the centroid with the lowest distance is chosen and vectors contained in this partition are then taken as candidates. Final answer is computing the distances to the candidates and choosing the top k nearest of them.

Downside: when the queried object is located near the border with another region, while actual nearest neighbour is in another region. Mitigation is to increase the search scope and choose several regions to search for candidates based on the top m closest centroids to the object.

faiss.IndexIVFFlat

nlist: defines a number of regions (Voronoi cells) to create during training.
nprobe: determines how many regions to take for the search of candidates. Changing nprobe parameter does not require retraining.

Product Quantization

Product Quantization: Compressing high-dimensional vectors by 97% | Pinecone

Product quantization (PQ) is a popular method for dramatically compressing high-dimensional vectors to use 97% less…

www.pinecone.io

Can dramatically compress high-dimensional vectors to use 97% less memory, and for making nearest-neighbor search speeds 5.5x faster.

Splitting high-dimensional vector into equally sized chunks — our subvectors,
Assigning each of these subvectors to its nearest centroid (also called reproduction/reconstruction values),
Replacing these centroid values with unique IDs — each ID represents a centroid

Similarity Search, Part 2: Product Quantization

Learn a powerful technique to effectively compress large data

towardsdatascience.com

Each dataset vector is converted into a short memory-efficient representation (called PQ code). During training, divides each vector into several equal part ssubvectors and forms subspaces. Find centroids in each subspace, and subvector is encoded with the ID of the centroid.

Original vector: 1024 * 32 bits = 4096 bytes.
Encoded vector: 8 * 8 bits = 8 bytes.

Inference

A query vector is divided into subvectors. For each of its subvectors, distances to all the centroids of the corresponding subspace are computed. Ultimately, this information is stored in table d.

Then we calculate approximate distances for all database rows and search for vectors with the smallest values. Approximate distance:

For each of subvectors of a database vector, the closest centroid j is found (by using mapping values from PQ codes) and the partial distance d[i][j]from that centroid to the query subvector i (by using the calculated matrix d) is taken.
All the partial distances are squared and summed up. By taking the square root of this value, the approximate euclidean distance is obtained. If you want to know how to get approximate results for other metrics as well, navigate to the section below “Approximation of other distance metrics”.

Similarity Search, Part 3: Blending Inverted File Index and Product Quantization

In the first two parts of this series we have discussed two fundamental algorithms in information retrieval: inverted…

towardsdatascience.com

An inverted file index is constructed that divides the set of vectors into n Voronoi partitions. Inside each Voronoi partition, the coordinates of the centroid are subtracted from each vector and residuals are stored. After that, the product quantization algorithm is run on vectors from all the partitions.

Inference

For a given query, the k nearest centroids of Voronoi partitions are found. All the points inside these regions are considered as candidates. The query residual is then split into subvectors and uses Product Quantization

faiss.IndexIVFPQ

Scalar Quantization and Product Quantization - Zilliz blog

A hands-on dive into scalar quantization (integer quantization) and product quantization with Python.

zilliz.com

Inverted File Product Quantization (IVF_PQ): Accelerate vector search by creating indices

Vector similarity search is finding similar vectors from a list of given vectors in a particular embedding space. It…

blog.lancedb.com

In IVFPQ, an Inverted File index (IVF) is integrated with Product Quantization (PQ) to facilitate a rapid and effective approximate nearest neighbor search by initial broad-stroke that narrows down the scope of vectors in our search.

After this, we continue our PQ search as we did before — but with a significantly reduced number of vectors. By minimizing our Search Scope, it is anticipated to achieve significantly improved search speeds.

Hierarchical Navigable Small Worlds (HNSW)

Similarity Search, Part 4: Hierarchical Navigable Small World (HNSW)

Hierarchical Navigable Small World (HNSW) is a state-of-the-art algorithm used for an approximate search of nearest…

towardsdatascience.com

Based on the same principles as skip list and navigable small world. Its structure represents a multi-layered graph with fewer connections on the top layers and more dense regions on the bottom layers. Search starts from the highest layer and proceeds to one level below every time the local nearest neighbour is greedily found among the layer nodes. Instead of finding only one nearest neighbour on each layer, the efSearch (a hyperparameter) closest nearest neighbours to the query vector are found and each of these neighbours is used as the entry point on the next layer.

faiss.IndexHNSWFlat

Hierarchical Navigable Small Worlds (HNSW) | Pinecone

Hierarchical Navigable Small World (HNSW) graphs are among the top-performing indexes for vector similarity search[1]…

www.pinecone.io

Navigable Small World Graphs

When searching an NSW graph, we begin at a pre-defined entry-point. This entry point connects to several nearby vertices. We identify which of these vertices is the closest to our query vector and move there. We repeat the greedy-routing search process of moving from vertex to vertex by identifying the nearest neighboring vertices in each friend list. Eventually, we will find no nearer vertices than our current vertex — this is a local minimum and acts as our stopping condition.

HNSW, each layer is NSW, while layer is connected hierarchically using probability skip list structure. During search, we traverse edges in each layer just as we did for NSW, greedily moving to the nearest vertex until we find a local minimum. Unlike NSW, at this point, we shift to the current vertex in a lower layer and begin searching again. We repeat this process until finding the local minimum of our bottom layer — layer 0.

Graph construction: starts at the top layer. At each layer, greedily traverse across edges, finding the ef nearest neighbors to our inserted vector q. After finding the local minimum, it moves down to the next layer (just as is done during search). This process is repeated until reaching our chosen insertion layer. Then find efConstruction nearest neighbors, these nearest neighbors are candidates for the links to the new inserted element q and as entry points to the next layer.

Hierarchical Navigable Small Worlds (HNSW) - Zilliz blog

Learn about the most commonly used primary algorithm today: HNSW, how it performs in terms of speed and accuracy.

zilliz.com

What’s The Story With HNSW?

Exploring the path to fast nearest neighbour search with Hierarchical Navigable Small Worlds

towardsdatascience.com

Vector database comparison

LanceDB vs Qdrant

I ran a quick benchmark of LanceDB vs Qdrant.

medium.com

LanceDB is positioned as embedded database, as sending a beefy vector over HTTP often takes longer than finding its nearest neighbor. LanceDB uses IVF_PQ and Qdrant uses HNSW.

Vector Database Comparison

Compare any vector database to an alternative by architecture, scalability, performance, use cases and costs.

zilliz.com

Vector Database Benchmarks - Qdrant

The first comparative benchmark and benchmarking framework for vector search engines and vector databases.

qdrant.tech

Appendix

Integrating Vector Databases with LLMs: A Hands-On Guide

Discover how to boost LLMs using vector databases for precise, context-aware AI solutions. Learn to build smarter bots…

mlengineering.medium.com

Similarity Search, Part 7: LSH Compositions

Dive into combinations of LSH functions to guarantee a more reliable search

towardsdatascience.com

Approximate Nearest Neighbors Oh Yeah (Annoy) - Zilliz blog

Discover the groundbreaking capabilities of Annoy, an innovative algorithm revolutionizing approximate nearest neighbor…

zilliz.com

Approximate Nearest Neighbors Oh Yeah (ANNOY)

ANNOY (Approximate Nearest Neighbors Oh Yeah) is a C++ library with Python bindings to search for points in space that…

sds-aau.github.io

To compute the nearest neighbors it is splitting the set of points into half and is doing this recursively until each set is having k items. Usually k should be around 100.

https://blog.qdrant.tech/batch-vector-search-with-qdrant-8c4d598179d5

Billion-scale vector search with Vespa — part one

Part one in a blog post series on a billion-scale vector search. This post covers using nearest neighbor search with…

medium.com

Billion-scale vector search with Vespa — part two

Photo by Vincentiu Solomon on Unsplash

medium.com

How to Quickly Build A Semantic Search System With txtai And Weaviate

An introduction to the weaviate-txtai library

pub.towardsai.net

Supercharged Semantic Similarity Search in Production

Blazing Fast, Highly Scalable Text-to-Image Search with CLIP embeddings and Milvus

medium.com

Understanding Consistency Level in the Milvus Vector Database

Learn about the four levels of consistency: strong, bounded staleness, session, and eventual supported in the Milvus…

milvusio.medium.com

Transforming Text Classification with Semantic Search Techniques — Faiss

Classification models serve as supervised tools for organising documents into specific categories. Semantic search…

medium.com

Vector Database Part 2: Index

Vector search at scale

Vector database indexing types

Nearest Neighbor Indexes for Similarity Search | Pinecone

One of the key components to efficient search is flexibility. And for that we have a wide range of search indexes…

Choosing the Right Vector Index for Your Project - Zilliz blog

Understanding in-memory vector search algorithms, indexing strategies, and guidelines on choosing the right vector…

Locality Sensitive Hashing (LSH)

Similarity Search, Part 5: Locality Sensitive Hashing (LSH)

Explore how similarity information can be incorporated into hash function

Locality Sensitive Hashing (LSH): The Illustrated Guide | Pinecone

Locality sensitive hashing (LSH) is a widely popular technique used in approximate nearest neighbor (ANN) search. The…

Similarity Search, Part 6: Random Projections with LSH Forest

Understand how to hash data and reflect its similarity by constructing random hyperplanes

Random Projection for Locality Sensitive Hashing | Pinecone

Using the random projection method, we will reduce our highly-dimensional vectors into low-dimensionality binary…

Inverted File Index

Similarity Search, Part 1: kNN & Inverted File Index

Similarity search is a popular problem where given a query Q we need to find the most similar documents to it among all…

Product Quantization

Product Quantization: Compressing high-dimensional vectors by 97% | Pinecone

Product quantization (PQ) is a popular method for dramatically compressing high-dimensional vectors to use 97% less…

Similarity Search, Part 2: Product Quantization

Learn a powerful technique to effectively compress large data

Similarity Search, Part 3: Blending Inverted File Index and Product Quantization

In the first two parts of this series we have discussed two fundamental algorithms in information retrieval: inverted…

Scalar Quantization and Product Quantization - Zilliz blog

A hands-on dive into scalar quantization (integer quantization) and product quantization with Python.

Inverted File Product Quantization (IVF_PQ): Accelerate vector search by creating indices

Vector similarity search is finding similar vectors from a list of given vectors in a particular embedding space. It…

Hierarchical Navigable Small Worlds (HNSW)

Similarity Search, Part 4: Hierarchical Navigable Small World (HNSW)

Hierarchical Navigable Small World (HNSW) is a state-of-the-art algorithm used for an approximate search of nearest…

Hierarchical Navigable Small Worlds (HNSW) | Pinecone

Hierarchical Navigable Small World (HNSW) graphs are among the top-performing indexes for vector similarity search[1]…

Hierarchical Navigable Small Worlds (HNSW) - Zilliz blog

Learn about the most commonly used primary algorithm today: HNSW, how it performs in terms of speed and accuracy.

What’s The Story With HNSW?

Exploring the path to fast nearest neighbour search with Hierarchical Navigable Small Worlds

Vector database comparison

LanceDB vs Qdrant

I ran a quick benchmark of LanceDB vs Qdrant.

Vector Database Comparison

Compare any vector database to an alternative by architecture, scalability, performance, use cases and costs.

Vector Database Benchmarks - Qdrant

The first comparative benchmark and benchmarking framework for vector search engines and vector databases.

Appendix

Integrating Vector Databases with LLMs: A Hands-On Guide

Discover how to boost LLMs using vector databases for precise, context-aware AI solutions. Learn to build smarter bots…

Similarity Search, Part 7: LSH Compositions

Dive into combinations of LSH functions to guarantee a more reliable search

Approximate Nearest Neighbors Oh Yeah (Annoy) - Zilliz blog

Discover the groundbreaking capabilities of Annoy, an innovative algorithm revolutionizing approximate nearest neighbor…

Approximate Nearest Neighbors Oh Yeah (ANNOY)

ANNOY (Approximate Nearest Neighbors Oh Yeah) is a C++ library with Python bindings to search for points in space that…

Billion-scale vector search with Vespa — part one

Part one in a blog post series on a billion-scale vector search. This post covers using nearest neighbor search with…

Billion-scale vector search with Vespa — part two

Photo by Vincentiu Solomon on Unsplash

How to Quickly Build A Semantic Search System With txtai And Weaviate

An introduction to the weaviate-txtai library

Supercharged Semantic Similarity Search in Production

Blazing Fast, Highly Scalable Text-to-Image Search with CLIP embeddings and Milvus

Understanding Consistency Level in the Milvus Vector Database

Learn about the four levels of consistency: strong, bounded staleness, session, and eventual supported in the Milvus…

Transforming Text Classification with Semantic Search Techniques — Faiss

Classification models serve as supervised tools for organising documents into specific categories. Semantic search…

Written by Xin Cheng