In this lesson, you will learn how to use vectors indexes with LangChain to perform vector search.
Movie Plots
Each Movie
node in the database has a .plot
property.
MATCH (m:Movie {title: "Toy Story"})
RETURN m.title AS title, m.plot AS plot
"A cowboy doll is profoundly threatened and jealous when a new spaceman figure supplants him as top toy in a boy's room."
Embeddings have been created for 1000 movie plots.
The embedding is stored in the .plotEmbedding
property of the Movie
nodes.
MATCH (m:Movie {title: "Toy Story"})
RETURN m.title AS title, m.plot AS plot, m.plotEmbedding
A vector index, moviePlots
, has been created for the .plotEmbedding
property of the Movie
nodes.
You can use the moviePlots
vector index to find the most similar movies by comparing embeddings of movie plots.
Similarity Search
The Neo4jVector
class provides an interface to use vector indexes in Neo4j.
You can use Neo4jVector
to create a vector store that can modify data and perform similarity search.
Open the genai-integration-langchain/vector_search.py
file.
Unresolved directive in lesson.adoc - include::{repository-raw}/new-course/genai-integration-langchain/vector_search.py[tag=**]
To perform a similarity search, you need to:
-
Connect to a Neo4j database
-
Create an embedding model to convert a query into a vector
-
Create a
Neo4jVector
instance and connect to the database -
Use the
similarity_search
method to find similar nodes based on the query
Embedding model
The movie plot embeddings were created using the OpenAI text-embedding-ada-002
model.
You need to use the same model to convert the query into vectors.
Use the OpenAIEmbeddings
class to create the embedding model:
Unresolved directive in lesson.adoc - include::{repository-raw}/new-course/genai-integration-langchain/solutions/vector_search.py[tag=import_embedding_model]
Unresolved directive in lesson.adoc - include::{repository-raw}/new-course/genai-integration-langchain/solutions/vector_search.py[tag=embedding_model]
Vector Store
Use the Neo4jVector
class to create a vector store that connects to the Neo4j database, uses the embedding model, and the moviePlots
index.
Unresolved directive in lesson.adoc - include::{repository-raw}/new-course/genai-integration-langchain/solutions/vector_search.py[tag=import_neo4jvector]
Unresolved directive in lesson.adoc - include::{repository-raw}/new-course/genai-integration-langchain/solutions/vector_search.py[tag=plot_vector]
When specifying the vector index you must also state the properties that contain the text (text_node_property
) and the embedding (embedding_node_property
).
Search
The similarity_search
method of the Neo4jVector
class allows you to perform a similarity search based on a query.
Unresolved directive in lesson.adoc - include::{repository-raw}/new-course/genai-integration-langchain/solutions/vector_search.py[tag=search]
The query is converted into a vector using the embedding model, and then the vector index is used to find the most similar nodes.
The k
parameter specifies the number of similar nodes to return.
Click to see the complete code
Unresolved directive in lesson.adoc - include::{repository-raw}/new-course/genai-integration-langchain/solutions/vector_search.py[tags="**;!examples;!results"]
Running the code will return the most similar movies to the query.
The method returns a list of LangChain Document
objects, each containing the plot as the content
and the node properties as metadata
.
You can parse the results to extract the movie titles and plots.
Unresolved directive in lesson.adoc - include::{repository-raw}/new-course/genai-integration-langchain/solutions/vector_search.py[tag=results]
Experiment with different plots, such as:
Unresolved directive in lesson.adoc - include::{repository-raw}/new-course/genai-integration-langchain/solutions/vector_search.py[tag=examples]
Filtering results
You can filter the results of the similarity_search
method by using the filter
parameter.
The filter
parameter allows you to specify a condition to filter the results, for example, only return movies with a revenue
gretaer than 200 million:
result = plot_vector.similarity_search(
plot,
k=3,
filter={"revenue": {"$gte": 200000000}}
)
You can learn more about Neo4jVector metadata filtering in the LangChain documentation.
Check Your Understanding
Why is an Embedding Model Required?
Why does the Neo4jVector
class require you to provide an embedding model?
-
❏ To connect to the Neo4j database
-
✓ To convert text into a vector representation
-
❏ To create the vector index in the database
-
❏ To extract metadata from the movie nodes
Hint
What must happen to a query before it can be compared to the stored vectors?
Solution
The answer is To convert text into a vector representation.
The Neo4jVector
class requires an embedding model to convert the input text (such as a search query) into a vector.
The query vector can then be compared to the vectors stored in the database to find similar items.
Lesson Summary
In this lesson, you learned how to use the Neo4jVector
class to perform vector search.
In the next lesson, you will add a RAG vector retriever to the agent to retrieve relevant movie plots based on user queries.