Vector Indexes

In the last lesson, you learned about embeddings, vectors and their role in RAG.

In this lesson, you will learn how to use a vector index in Neo4j to compare embeddings to find similar data.

Movie Plots

GraphAcademy created a Neo4j sandbox of movie recommendations when you enrolled in this course. The recommendations database contains over 9000 movies, 15000 actors, and over 100000 user ratings.

Each movie has a .plot property.

cypher

Movie Plot Example

MATCH (m:Movie {title: "Toy Story"})
RETURN m.title AS title, m.plot AS plot

"A cowboy doll is profoundly threatened and jealous when a new spaceman figure supplants him as top toy in a boy's room."

Plot Embeddings

Embeddings have been created for 1000 movie plots. The embedding is stored in the .plotEmbedding property of the Movie nodes.

cypher

View the plot embedding

MATCH (m:Movie {title: "Toy Story"})
RETURN m.title AS title, m.plot AS plot, m.plotEmbedding

The following Cypher query will return the titles and plots for the movies that have embeddings:

cypher

MATCH (m:Movie)
WHERE m.plotEmbedding IS NOT NULL
RETURN m.title, m.plot

A vector index, moviePlots, has been created for the .plotEmbedding property of the Movie nodes.

You can use the moviePlots vector index to find the most similar movies by comparing the movie plot embeddings.

Click to see how the vector index was created

This Cypher script loads the Movie plot embeddings from an external file and create the moviePlots vector index:

cypher

LOAD CSV WITH HEADERS
FROM 'https://data.neo4j.com/rec-embed/movie-plot-embeddings-1k.csv'
AS row
MATCH (m:Movie {movieId: row.movieId})
CALL db.create.setNodeVectorProperty(m, 'plotEmbedding', apoc.convert.fromJsonList(row.embedding));

CREATE VECTOR INDEX moviePlots IF NOT EXISTS
FOR (m:Movie)
ON m.plotEmbedding
OPTIONS {indexConfig: {
 `vector.dimensions`: 1536,
 `vector.similarity_function`: 'cosine'
}};

You can learn more about creating embeddings and vector indexes in the GraphAcademy Introduction to Vector Indexes and Unstructured Data course.

Querying Vector Indexes

You can query the moviePlots index using the db.index.vector.queryNodes() procedure.

The procedure returns the requested number of approximate nearest neighbor nodes and their similarity score, ordered by the score.

cypher

db.index.vector.queryNodes Syntax

CALL db.index.vector.queryNodes(
    indexName :: STRING,
    numberOfNearestNeighbours :: INTEGER,
    query :: LIST<FLOAT>
) YIELD node, score

The procedure accepts three parameters:

indexName - The name of the vector index
numberOfNearestNeighbours - The number of results to return
query - A list of floats that represent an embedding

The procedure yields two arguments:

A node which matches the query
A similarity score ranging from 0.0 to 1.0.

You can use this procedure to find the closest embedding value to a given value.

Querying Similar Movie Plots

You can use the moviePlots vector index to find movies with similar plots.

Review this Cypher before running it.

cypher

Similar Plots

MATCH (m:Movie {title: 'Toy Story'})

CALL db.index.vector.queryNodes('moviePlots', 6, m.plotEmbedding)
YIELD node, score

RETURN node.title AS title, node.plot AS plot, score

The query finds the Toy Story Movie node and uses the .plotEmbedding property to find the most similar plots.

The db.index.vector.queryNodes() procedure uses the moviePlots vector index to find similar embeddings.

Run the query. The procedure returns the requested number of nodes and their similarity score, ordered by the score.

Click to reveal the results

Learn how Neo4j and GraphRAG can support your Generative AI projectsSimilar Plots Results
title	plot	score
"Toy Story"	"A cowboy doll is profoundly threatened and jealous when a new spaceman figure supplants him as top toy in a boy’s room."	1.0
"Little Rascals, The"	"Alfalfa is wooing Darla and his He-Man-Woman-Hating friends attempt to sabotage the relationship."	0.9214372634887695
"NeverEnding Story III, The"	"A young boy must restore order when a group of bullies steal the magical book that acts as a portal between Earth and the imaginary world of Fantasia."	0.9206198453903198
"Drop Dead Fred"	"A young woman finds her already unstable life rocked by the presence of a rambunctious imaginary friend from childhood."	0.9199690818786621
"E.T. the Extra-Terrestrial"	"A troubled child summons the courage to help a friendly alien escape Earth and return to his home-world."	0.919100284576416
"Gumby: The Movie"	"In this offshoot of the 1950s claymation cartoon series, the crazy Blockheads threaten to ruin Gumby’s benefit concert by replacing the entire city of Clokeytown with robots."	0.9180967211723328

The similarity score is between 0.0 and 1.0, with 1.0 being the most similar. Note how the most similar plot is that of the Toy Story movie itself!

Generate Embeddings

You can generate a new embedding in Cypher using the genai.vector.encode function:

cypher

genai.vector.encode Syntax

WITH genai.vector.encode(
    "Text to create embeddings for",
    "OpenAI",
    { token: "sk-..." }) AS embedding
RETURN embedding

You will need to replace token: "sk-…" with an OpenAI API key.

Generate a Plot Embedding

You can use the embedding to query the vector index to find similar movies.

This query, creates and embedding for the text "A mysterious spaceship lands Earth" and uses it to query the moviePlots vector index for the 6 most similar movie plots.

cypher

WITH genai.vector.encode(
    "A mysterious spaceship lands Earth",
    "OpenAI",
    { token: "sk-..." }) AS myMoviePlot
CALL db.index.vector.queryNodes('moviePlots', 6, myMoviePlot)
YIELD node, score
RETURN node.title, node.plot, score

Experiment with different movie plots and observe the results.

Considerations

Using embeddings and vectors is relatively straightforward and can quickly yield results. The downside to this approach is that it relies heavily on the embeddings and similarity function to produce valid results.

This approach is also a black box. There are 1536 dimensions; it would be impossible to determine how the vectors are structured and how they influenced the similarity score.

The movies returned look similar, but without reading and comparing them, you would have no way of verifying that the results are correct.

Considerations

Vectors work well for:

Contextual or Meaning Based Questions
Fuzzy or Vague queries
Broad or Open-Ended questions
Complex queries with multiple concepts

Vectors are ineffective for:

Highly Specific or Fact-Based Questions
Numerical or Exact-Match Queries
Boolean or Logical Queries
Ambiguous or Unclear Queries without Context
Specialised Knowledge

In the next lesson you will look at how you can improve the results by using a combination of vector and graph queries.

Check your understanding

Querying vector index

What parameters does the db.index.vector.queryNodes() procedure require? (Select all that apply)

✓ indexName - The name of the vector index to query
✓ numberOfNearestNeighbours - The number of results to return
✓ query - The embedding to compare against
❏ token - The OpenAI token to use for the query

Hint

A token is only required to create an embedding not to query one.

Solution

The db.index.vector.queryNodes() procedure requires these parameters?

✓ indexName - The name of the vector index to query
✓ numberOfNearestNeighbours - The number of results to return
✓ query - The embedding to compare against

A token is only required to create an embedding, not to query the index.

Lesson Summary

In this lesson, you learned how to use a vector index in Neo4j and when they are useful for finding context for Generative AI applications.

In the next lesson, you will learn how to GraphRAG can improve the results of your queries.

Neo4j & GenerativeAI Fundamentals

Generative AI

Retrieval Augmented Generation (RAG)

Knowledge Graphs

Integrating Neo4j with Generative AI

Vector Indexes

Vector Indexes

Movie Plots

Plot Embeddings

Querying Vector Indexes

Querying Similar Movie Plots

Generate Embeddings

Generate a Plot Embedding

Considerations

Considerations

Check your understanding

Querying vector index

Lesson Summary

Chatbot