In the last lesson, you learned about embeddings, vectors and their role in RAG.
In this lesson, you will learn how to use a vector index in Neo4j to compare embeddings to find similar data.
Movie Plots
GraphAcademy created a Neo4j sandbox of movie recommendations when you enrolled in this course. The recommendations database contains over 9000 movies, 15000 actors, and over 100000 user ratings.
Each movie has a .plot
property.
MATCH (m:Movie {title: "Toy Story"})
RETURN m.title AS title, m.plot AS plot
"A cowboy doll is profoundly threatened and jealous when a new spaceman figure supplants him as top toy in a boy's room."
Embeddings have been created for 1000 movie plots.
The embedding is stored in the .plotEmbedding
property of the Movie
nodes.
MATCH (m:Movie {title: "Toy Story"})
RETURN m.title AS title, m.plot AS plot, m.plotEmbedding
The following Cypher query will return the titles and plots for the movies that have embeddings:
MATCH (m:Movie)
WHERE m.plotEmbedding IS NOT NULL
RETURN m.title, m.plot
A vector index, moviePlots
, has been created for the .plotEmbedding
property of the Movie
nodes.
You can use the moviePlots
vector index to find the most similar movies by comparing the movie plot embeddings.
Querying Vector Indexes
You can query the moviePlots
index using the db.index.vector.queryNodes()
procedure.
The procedure returns the requested number of approximate nearest neighbor nodes and their similarity score, ordered by the score.
CALL db.index.vector.queryNodes(
indexName :: STRING,
numberOfNearestNeighbours :: INTEGER,
query :: LIST<FLOAT>
) YIELD node, score
The procedure accepts three parameters:
-
indexName
- The name of the vector index -
numberOfNearestNeighbours
- The number of results to return -
query
- A list of floats that represent an embedding
The procedure yields two arguments:
-
A
node
which matches the query -
A similarity
score
ranging from0.0
to1.0
.
You can use this procedure to find the closest embedding value to a given value.
For example, using the moviePlots
vector index to find movies with similar plots.
Review this Cypher before running it.
MATCH (m:Movie {title: 'Toy Story'})
CALL db.index.vector.queryNodes('moviePlots', 6, m.plotEmbedding)
YIELD node, score
RETURN node.title AS title, node.plot AS plot, score
The query finds the Toy Story Movie
node and uses the .plotEmbedding
property to find the most similar plots.
The db.index.vector.queryNodes()
procedure uses the moviePlots
vector index to find similar embeddings.
Run the query. The procedure returns the requested number of nodes and their similarity score, ordered by the score.
Click to reveal the results
title |
plot |
score |
"Toy Story" |
"A cowboy doll is profoundly threatened and jealous when a new spaceman figure supplants him as top toy in a boy’s room." |
1.0 |
"Little Rascals, The" |
"Alfalfa is wooing Darla and his He-Man-Woman-Hating friends attempt to sabotage the relationship." |
0.9214372634887695 |
"NeverEnding Story III, The" |
"A young boy must restore order when a group of bullies steal the magical book that acts as a portal between Earth and the imaginary world of Fantasia." |
0.9206198453903198 |
"Drop Dead Fred" |
"A young woman finds her already unstable life rocked by the presence of a rambunctious imaginary friend from childhood." |
0.9199690818786621 |
"E.T. the Extra-Terrestrial" |
"A troubled child summons the courage to help a friendly alien escape Earth and return to his home-world." |
0.919100284576416 |
"Gumby: The Movie" |
"In this offshoot of the 1950s claymation cartoon series, the crazy Blockheads threaten to ruin Gumby’s benefit concert by replacing the entire city of Clokeytown with robots." |
0.9180967211723328 |
The similarity score is between 0.0
and 1.0
, with 1.0
being the most similar. Note how the most similar plot is that of the Toy Story movie itself!
Generate embeddings
You can generate a new embedding in Cypher using the genai.vector.encode
function:
WITH genai.vector.encode(
"Text to create embeddings for",
"OpenAI",
{ token: "sk-..." }) AS embedding
RETURN embedding
token: "sk-…"
with an OpenAI API key.You can use the embedding to query the vector index to find similar movies.
This query, creates and embedding for the text "A mysterious spaceship lands Earth" and uses it to query the moviePlots
vector index for the 6 most similar movie plots.
WITH genai.vector.encode(
"A mysterious spaceship lands Earth",
"OpenAI",
{ token: "sk-..." }) AS myMoviePlot
CALL db.index.vector.queryNodes('moviePlots', 6, myMoviePlot)
YIELD node, score
RETURN node.title, node.plot, score
Experiment with different movie plots and observe the results.
Considerations
Using embeddings and vectors is relatively straightforward and can quickly yield results. The downside to this approach is that it relies heavily on the embeddings and similarity function to produce valid results.
This approach is also a black box. There are 1536 dimensions; it would be impossible to determine how the vectors are structured and how they influenced the similarity score.
The movies returned look similar, but without reading and comparing them, you would have no way of verifying that the results are correct.
Vectors work well for:
-
Contextual or Meaning Based Questions
-
Fuzzy or Vague queries
-
Broad or Open-Ended questions
-
Complex queries with multiple concepts
Vectors are ineffective for:
-
Highly Specific or Fact-Based Questions
-
Numerical or Exact-Match Queries
-
Boolean or Logical Queries
-
Ambiguous or Unclear Queries without Context
-
Specialised Knowledge
In the next lesson you will look at how you can improve the results by using a combination of vector and graph queries.
Check your understanding
Querying vector index
What parameters does the db.index.vector.queryNodes()
procedure require? (Select all that apply)
Hint
A token is only required to create an embedding not to query one.
Solution
The db.index.vector.queryNodes()
procedure requires these parameters?
A token is only required to create an embedding, not to query the index.
Lesson Summary
In this lesson, you learned how to use a vector index in Neo4j and when they are useful for finding context for Generative AI applications.
In the next lesson, you will learn how to GraphRAG can improve the results of your queries.