Semantic search with vector database

Published in

Dev Genius

3 min readJan 19, 2022

Recently I heard of “vector database” a lot. After doing some research, I found that it is related to “semantic search”. Let’s understand these concepts and why we need them.

Semantic search: different people have different meanings. However, here I mean “search by query intent/query meaning”. The purpose is to achieve more accurate search result beyond simple keyword search. Suppose you want to search “who is number 1 soccer player in the world”, actually you mean “who is the best soccer player in the world” or “top soccer players in the world”, not “which soccer player in the world wears ‘number 1’ shirt”. This could be widely useful in any search scenario (not only natural language, but also audio, image, etc.)

What Is Semantic Search? - Lucidworks

A colleague recently said to me, "I have the impression that different people mean different things when they talk…

lucidworks.com

Vector: vector is just a list of numbers. For natural language processing, you can tokenize the word and translate the sentence into a list of word-index.

Embedding: If your sentence is very long and you have lots of words, the vector will be very long, and with one-hot encoding, a sentence could become sparse-vector and have curse-of-dimensionality. An embedding is a relatively low-dimensional space into which you can translate high-dimensional vectors. Embeddings make it easier to do machine learning on large inputs like sparse vectors representing words. Ideally, an embedding captures some of the semantics of the input by placing semantically similar inputs close together in the embedding space.

Neural Network Embeddings Explained

How deep learning can represent War and Peace as a vector

towardsdatascience.com

What are Vector Embeddings? | Pinecone

Vector embeddings are one of the most fascinating and useful concepts in machine learning. They are central to many…

www.pinecone.io

Vector similarity search: now when you search embedding-encoded representation of the query, you don’t want to search for exact match, but similar embedding vectors or nearby embedding vectors in proximity, which will have similar semantic meaning.

Vector database: A vector database indexes and stores vector embeddings for fast retrieval and similarity search, with capabilities like CRUD operations, metadata filtering, and horizontal scaling. Traditional database is not designed to easily store, index and search vectors.

What is a Vector Database? | Pinecone

Complex data is growing at break-neck speed. These are unstructured forms of data that include documents, images…

www.pinecone.io

Pinecone is a vector database, and again, the best thing in learning a new thing is a working code sample repo.

examples/light_demo.ipynb at master · pinecone-io/examples

Contribute to pinecone-io/examples development by creating an account on GitHub.

github.com

This notebook shows how to do a semantic search with Pinecone. It is pretty clear, so I just describe the main steps, you can follow the notebook:

Use Quora question duplicates dataset, which contains pairs of questions that are not syntactically the same, but share the same meaning
Connect to Pinecore and create an index to store vectors
For each Quora question, generate (id, vectors, metadata) tuple and insert into Pinecore index
Use a sample Quora question to search for similar questions in the index

Appendix

Semantic search with embeddings: index anything

Building scalable semantic retrieval from image, text, graph, and interaction data

rom1504.medium.com

How to Build a Semantic Search Engine With Transformers and Faiss

In this tutorial, you will learn how to build a vector-based search engine with sentence transformers and Faiss.

towardsdatascience.com

GitHub - rom1504/awesome-semantic-search: Semantic search with embeddings: index anything

In Semantic search with embeddings, I described how to build semantic search systems (also called neural search). These…

github.com

Not All Vector Databases Are Made Equal

A detailed comparison of Milvus, Pinecone, Vespa, Weaviate, Vald, GSI and Qdrant

towardsdatascience.com

Dev Genius

Semantic search with vector database

What Is Semantic Search? - Lucidworks

A colleague recently said to me, "I have the impression that different people mean different things when they talk…

Neural Network Embeddings Explained

How deep learning can represent War and Peace as a vector

What are Vector Embeddings? | Pinecone

Vector embeddings are one of the most fascinating and useful concepts in machine learning. They are central to many…

What is a Vector Database? | Pinecone

Complex data is growing at break-neck speed. These are unstructured forms of data that include documents, images…

examples/light_demo.ipynb at master · pinecone-io/examples

Contribute to pinecone-io/examples development by creating an account on GitHub.

Appendix

Semantic search with embeddings: index anything

Building scalable semantic retrieval from image, text, graph, and interaction data

How to Build a Semantic Search Engine With Transformers and Faiss

In this tutorial, you will learn how to build a vector-based search engine with sentence transformers and Faiss.

GitHub - rom1504/awesome-semantic-search: Semantic search with embeddings: index anything

In Semantic search with embeddings, I described how to build semantic search systems (also called neural search). These…

Not All Vector Databases Are Made Equal

A detailed comparison of Milvus, Pinecone, Vespa, Weaviate, Vald, GSI and Qdrant

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Published in Dev Genius

Written by Xin Cheng

No responses yet