Semantic Search, Vectors, and Embeddings

Semantic Search

Machine learning and natural language processing (NLP) often use vectors and embeddings to represent and understand data.

Semantic search aims to understand search phrases' intent and contextual meaning, rather than focusing on individual keywords.

Traditional keyword search often depends on exact-match keywords or proximity-based algorithms that find similar words.

For example, if you input "apple" in a traditional search, you might predominantly get results about the fruit.

However, in a semantic search, the engine tries to gauge the context: Are you searching about the fruit, the tech company, or something else?

An apple in the middle with a tech icons on the left and a food on the right

What are Vectors

Vectors are simply a list of numbers. For example, the vector [1, 2, 3]` is a list of three numbers and could represent a point in three-dimensional space.

You can use vectors to represent many different types of data, including text, images, and audio.

Using vectors with a dimensionality of hundreds and thousands in machine learning and natural language processing (NLP) is common.

A diagram showing a 3d representation of the x

What are Embeddings?

When referring to vectors in the context of machine learning and NLP, the term "embedding" is typically used. Embeddings are numerical translations of data objects, for example images, text, or audio, represented as vectors. This way, LLM algorithms will be able to compare two different text paragraphs by comparing their numerical representations.

Each dimension in a vector can represent a particular semantic aspect of the word or phrase. When multiple dimensions are combined, they can convey the overall meaning of the word or phrase.

For example, the word "apple" might be represented by an embedding with the following dimensions:

fruit
technology
color
taste
shape

You can create embeddings in various ways, but one of the most common methods is to use a large language model.

For example, the embedding for the word "apple" is 0.0077788467, -0.02306925, -0.007360777, -0.027743412, -0.0045747845, 0.01289164, -0.021863015, -0.008587573, 0.01892967, -0.029854324, -0.0027962727, 0.020108491, -0.004530236, 0.009129008, … and so on.

How are vectors used in semantic search?

You can use the distance or angle between vectors to gauge the semantic similarity between words or phrases.

Words with similar meanings or contexts will have vectors that are close together, while unrelated words will be farther apart.

This principle is employed in semantic search to find contextually relevant results for a user’s query.

A 3 dimensional chart illustrating the distance between vectors. The vectors are for the words "apple" and "fruit"

Continue

When you are ready, you can move on to the next task.

Summary

You learned about semantic search, vectors, and embeddings.

Next, you will use a Neo4j vector index to find similar data.

Gen-AI - Hands-on Workshop

Knowledge Graphs, Unstructured Data, and Vectors

LLMs, RAG, Python, and LangChain