Query a Vector Index

In this lesson, you will learn how to query a vector index. You will use the Question and Answer embeddings to find similar responses.

Querying with Embeddings

When querying a vector index, you have to query with an embedding.

For example, you want to use the vector index to find questions similar to the text "What are examples of good open-source projects?". You would first get an embedding of the text. Then, you would use the embedding to query the vector index.

You are going to explore two scenarios:

  1. A user views an existing question and wants to see similar questions.

  2. A user submits a new question and receives answers to a similar question.

In the first scenario, you will use existing question embeddings to find similar questions. In the second scenario, you will generate a new embedding for the user’s question to find similar questions and answers.

These scenarios will help you understand how to query the vector index to find similar questions and answers.

Finding similar questions

You can use the questions and answers vector indexes to find questions that are similar to each other.

A user views an existing question and wants to see similar questions.

The following Cypher query finds similar questions to the question "What are the most touristic countries in the world?".

Review the query before running it and observing the results.

cypher
MATCH (q:Question {text: "What are the most touristic countries in the world?"})
CALL db.index.vector.queryNodes('questions', 6, q.embedding)
YIELD node, score
RETURN node.text, score

Breaking down the query, you can identify the following:

  1. The MATCH clause finds the specific Question node.

  2. The query uses the db.index.vector.queryNodes function to query the questions vector index with the Question node’s embedding - q.embedding. The function returns the top 6 similar nodes.

  3. YIELD obtain the node and similarity score returned by the function.

  4. The query returns the Question node’s text property and the similarity score.

You can extend this query to return the answers to the most similar questions:

cypher
MATCH (q:Question {text: "What are the most touristic countries in the world?"})
CALL db.index.vector.queryNodes('questions', 6, q.embedding)
YIELD node, score
MATCH (node)-[:ANSWERED_BY]->(a)
RETURN a.text, score

The query uses the node and the ANSWERED_BY relationship to find the answers.

Run the query and observe the results. You will notice that the top answers returned are similar to the question. As you get further down the list, the similarity score decreases and so does the relevance of the answers.

Finding answers to a similar question

To improve the user’s experience when asking a new question, you could use the vector index to find similar questions and answers.

To achieve this, you need to generate an embedding for the user’s new question and use it to query the vector index.

You can generate a new embedding in Cypher using the genai.vector.encode function:

genai.vector.encode syntax
genai.vector.encode(
    resource :: STRING,
    provider :: STRING,
    configuration :: MAP = {}
) :: LIST<FLOAT>

You pass the text you want to encode as the resource parameter.

You can use embedding models from different providers, such as OpenAI, Vertex AI, and Amazon Bedrock. Provider-specific details like, API keys, are passed in the configuration map.

For example, you can use the OpenAI provider to generate an embedding passing the API key as token in the configuration map:

cypher
WITH genai.vector.encode("Test", "OpenAI", { token: "sk-..." }) AS embedding
RETURN embedding

OpenAI API key

To run this query, you must replace the token value with your OpenAI API key.

You can incorporate the embedding into your query to find similar questions:

cypher
WITH genai.vector.encode(
    "What are good open source projects",
    "OpenAI",
    { token: "sk-..." }) AS userEmbedding
CALL db.index.vector.queryNodes('questions', 6, userEmbedding)
YIELD node, score
RETURN node.text, score

This query, creates an embedding using genai.vector.encode and then uses that embedding to query the questions vector index.

Try changing the text and observe the results.

Can you modify this query to work the same as the previous query and return the answers to the most similar questions?

Check Your Understanding

Query Vector Index

True or False - you can pass the text you wish to search for to db.index.vector.queryNodes.

  • ❏ True

  • ✓ False

Hint

A vector index can only search for text embeddings.

Solution

The statement is False. db.index.vector.queryNodes requires an embedding to be passed to it, not text.

Lesson Summary

In this lesson, you learned how to query a vector index and generate embeddings using Cypher.

In the next module, you will learn how to import unstructured data into Neo4j using Python.