To take advantage of the relationships in the graph, you can create a retriever that uses both vector search and graph traversal to find relevant data.
The VectorCypherRetriever
allows you to perform vector searches and then traverse the graph to find related nodes or entities.
Open the genai_fundamentals/vector_cypher_rag.py
file and review the code:
import os
from dotenv import load_dotenv
load_dotenv()
from neo4j import GraphDatabase
from neo4j_graphrag.embeddings.openai import OpenAIEmbeddings
from neo4j_graphrag.llm import OpenAILLM
from neo4j_graphrag.generation import GraphRAG
# Connect to Neo4j database
driver = GraphDatabase.driver(
os.getenv("NEO4J_URI"),
auth=(
os.getenv("NEO4J_USERNAME"),
os.getenv("NEO4J_PASSWORD")
)
)
# Create embedder
embedder = OpenAIEmbeddings(model="text-embedding-ada-002")
# Define retrieval query
retrieval_query =
# Create retriever
retriever =
# Create the LLM
llm = OpenAILLM(model_name="gpt-4o")
# Create GraphRAG pipeline
rag = GraphRAG(retriever=retriever, llm=llm)
# Search
query_text = "Find the highest rated action movie about travelling to other planets"
response = rag.search(
query_text=query_text,
retriever_config={"top_k": 5},
return_context=True
)
print(response.answer)
print("CONTEXT:", response.retriever_result.items)
# Close the database connection
driver.close()
The program includes all the code to connect to Neo4j, create the embedder
, llm
, and GraphRAG
pipeline.
Your task is to:
-
Configure the Cypher retrieval query that will traverse the graph
-
Create the
VectorCypherRetriever
retriever.
Retrieval Query
The retrieval query is a Cypher query that will be used to get data from the graph after the nodes are returned by the vector search.
The query receives the node
and score
variables yielded by the vector search.
Add this retrieval query to the code:
# Define retrieval query
retrieval_query = """
MATCH (node)<-[r:RATED]-()
RETURN
node.title AS title, node.plot AS plot, score AS similarityScore,
collect { MATCH (node)-[:IN_GENRE]->(g) RETURN g.name } as genres,
collect { MATCH (node)<-[:ACTED_IN]->(a) RETURN a.name } as actors,
avg(r.rating) as userRating
ORDER BY userRating DESC
"""
The query traverses the graph to find related nodes for genres and actors, as well as sorting the results by the user rating.
Retriever
You can now use the VectorCypherRetriever
class to create a retriever that will perform the vector search and then traverse the graph:
from neo4j_graphrag.retrievers import VectorCypherRetriever
# Create retriever
retriever = VectorCypherRetriever(
driver,
index_name="moviePlots",
embedder=embedder,
retrieval_query=retrieval_query,
)
The retriever requires the vector index name (moviePlots
), the retrieval query, and the embedder
to encode the query.
Click to view the complete code
import os
from dotenv import load_dotenv
load_dotenv()
from neo4j import GraphDatabase
from neo4j_graphrag.embeddings.openai import OpenAIEmbeddings
from neo4j_graphrag.retrievers import VectorCypherRetriever
from neo4j_graphrag.llm import OpenAILLM
from neo4j_graphrag.generation import GraphRAG
# Connect to Neo4j database
driver = GraphDatabase.driver(
os.getenv("NEO4J_URI"),
auth=(
os.getenv("NEO4J_USERNAME"),
os.getenv("NEO4J_PASSWORD")
)
)
# Create embedder
embedder = OpenAIEmbeddings(model="text-embedding-ada-002")
# Define retrieval query
retrieval_query = """
MATCH (node)<-[r:RATED]-()
RETURN
node.title AS title, node.plot AS plot, score AS similarityScore,
collect { MATCH (node)-[:IN_GENRE]->(g) RETURN g.name } as genres,
collect { MATCH (node)<-[:ACTED_IN]->(a) RETURN a.name } as actors,
avg(r.rating) as userRating
ORDER BY userRating DESC
"""
# Create retriever
retriever = VectorCypherRetriever(
driver,
index_name="moviePlots",
embedder=embedder,
retrieval_query=retrieval_query,
)
# Create the LLM
llm = OpenAILLM(model_name="gpt-4o")
# Create GraphRAG pipeline
rag = GraphRAG(retriever=retriever, llm=llm)
# Search
query_text = "Find the highest rated action movie about travelling to other planets"
response = rag.search(
query_text=query_text,
retriever_config={"top_k": 5},
return_context=True
)
print(response.answer)
print("CONTEXT:", response.retriever_result.items)
# Close the database connection
driver.close()
Context
When you run the code, it will complete a vector search for the provided query and then traverse the graph to find related nodes.
The additional context allows the LLM to generate more accurate responses based on the additional data in the graph.
Transparency
The context is returned after the response, allowing you to see what data was used to generate the respnse.
This transparency is important for understanding how the LLM arrived at its response and for debugging purposes.
When sent the query "Find the highest rated action movie about travelling to other planets", the GraphRAG pipeline will follow these steps:
-
Perform a vector search for movie plots related to travelling to other planets.
-
Run the retrieval query to find related actors, genres, and user ratings.
-
Pass the retrieved data to the LLM to generate a response.
You can expect a response to be based on:
-
Travelling to other planets.
-
The comedy genre.
-
With the highest user rating (not the vector similarity score).
A typical response might be "The highest rated action movie about traveling to other planets is "Aliens," with a user rating of 3.92"
Test the code with different queries relating to movies, actors, and genres, such as:
-
Find a comedy movie about vampires
-
Who acts in drama movies about romance and love?
-
What genres are represented about movies where the hero fails his mission?
If you are having unpredictable responses try modifying the temperature of the LLM:
llm = OpenAILLM(
model_name="gpt-4o",
model_params={"temperature": 0.5}
)
Optional challenge
Modify the retrieval query to include the directors of the movies in the context.
Directors can be found using the pattern (node)←[:DIRECTED]-(director)
.
Try queries relating to directors, such as "Who has directed movies about weddings?"
Continue
When you are ready, continue to the next lesson.
Lesson Summary
In this lesson, you learned how to create a retriever that uses both vector search and graph traversal to find relevant data
In the next lesson, you will create a text to cypher retriever to find relevant data based on natural language queries.