You can enhance a vector retriever using GraphRAG to include additional context.
In this lesson, you will update the vector retriever to retrieve additional metadata from the graph after the similarity search.
GraphRAG
You can add an additional Cypher retrieval query to the Neo4jVector
class.
The retrieval query is run after the similarity search and the data it returns is added to the Document
metadata.
You can use this retrieval query to retrieve useful context from the graph.
In the movie plot example, you could retrieve additional information about the movies, such as the actors or user ratings.
The additional context can be used to improve and expand the agent’s responses, for example:
Who acts in movies about Love and Romance?
The vector retriever will return movies about Love and Romance, the Cypher retrieval query will return the actors in those movies, and the agent can use this information to answer the question.
This method of vector + graph retrieval is a common approach to GraphRAG (Graph Retrieval Augmented Generation).
Retrieval Query
Open the genai-integration-langchain/vector_graph_retriever.py
file:
import os
from dotenv import load_dotenv
load_dotenv()
from langchain_core.documents import Document
from langchain.chat_models import init_chat_model
from langgraph.graph import START, StateGraph
from langchain_core.prompts import PromptTemplate
from typing_extensions import List, TypedDict
from langchain_openai import OpenAIEmbeddings
from langchain_neo4j import Neo4jGraph, Neo4jVector
# Initialize the LLM
model = init_chat_model("gpt-4o", model_provider="openai")
# Create a prompt
template = """Use the following pieces of context to answer the question at the end.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
{context}
Question: {question}
Answer:"""
prompt = PromptTemplate.from_template(template)
# Define state for application
class State(TypedDict):
question: str
context: List[Document]
answer: str
# Connect to Neo4j
graph = Neo4jGraph(
url=os.getenv("NEO4J_URI"),
username=os.getenv("NEO4J_USERNAME"),
password=os.getenv("NEO4J_PASSWORD"),
)
# Create the embedding model
embedding_model = OpenAIEmbeddings(model="text-embedding-ada-002")
# Define the retrieval query
# retrieval_query =
# Create Vector
plot_vector = Neo4jVector.from_existing_index(
embedding_model,
graph=graph,
index_name="moviePlots",
embedding_node_property="plotEmbedding",
text_node_property="plot",
)
# Define functions for each step in the application
# Retrieve context
def retrieve(state: State):
# Use the vector to find relevant documents
context = plot_vector.similarity_search(
state["question"],
k=6,
)
return {"context": context}
# Generate the answer based on the question and context
def generate(state: State):
messages = prompt.invoke({"question": state["question"], "context": state["context"]})
response = model.invoke(messages)
return {"answer": response.content}
# Define application steps
workflow = StateGraph(State).add_sequence([retrieve, generate])
workflow.add_edge(START, "retrieve")
app = workflow.compile()
# Run the application
question = "Who acts in movies about Love and Romance?"
response = app.invoke({"question": question})
print("Answer:", response["answer"])
print("Context:", response["context"])
This is the same code as the vector retriever agent you created.
You need to define a retrieval_query
that will be used to supplement the results of the similarity search.
# Define the retrieval query
retrieval_query = """
MATCH (node)<-[r:RATED]-()
WITH node, score, avg(r.rating) AS userRating
RETURN
"Title: " + node.title + ", Plot: " + node.plot AS text,
score,
{
title: node.title,
genres: [ (node)-[:IN_GENRE]->(g) | g.name ],
actors: [ (person)-[r:ACTED_IN]->(node) | [person.name, r.role] ],
userRating: userRating
} AS metadata
ORDER BY userRating DESC
"""
The query receives the node
and score
variables yielded by the vector search.
The query traverses the graph to find related nodes for genres and actors, as well as sorting the results by the user rating.
Vector Store
You can now update the Neo4jVector
to use the retrieval_query
:
# Create Vector
plot_vector = Neo4jVector.from_existing_index(
embedding_model,
graph=graph,
index_name="moviePlots",
embedding_node_property="plotEmbedding",
text_node_property="plot",
retrieval_query=retrieval_query,
)
The retrieve
function will add the additional context
to the agent when the similarity_search
method is used.
Click to view the complete code
import os
from dotenv import load_dotenv
load_dotenv()
from langchain_core.documents import Document
from langchain.chat_models import init_chat_model
from langgraph.graph import START, StateGraph
from langchain_core.prompts import PromptTemplate
from typing_extensions import List, TypedDict
from langchain_openai import OpenAIEmbeddings
from langchain_neo4j import Neo4jGraph, Neo4jVector
# Initialize the LLM
model = init_chat_model("gpt-4o", model_provider="openai")
# Create a prompt
template = """Use the following pieces of context to answer the question at the end.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
{context}
Question: {question}
Answer:"""
prompt = PromptTemplate.from_template(template)
# Define state for application
class State(TypedDict):
question: str
context: List[Document]
answer: str
# Connect to Neo4j
graph = Neo4jGraph(
url=os.getenv("NEO4J_URI"),
username=os.getenv("NEO4J_USERNAME"),
password=os.getenv("NEO4J_PASSWORD"),
)
# Create the embedding model
embedding_model = OpenAIEmbeddings(model="text-embedding-ada-002")
# Define the retrieval query
retrieval_query = """
MATCH (node)<-[r:RATED]-()
WITH node, score, avg(r.rating) AS userRating
RETURN
"Title: " + node.title + ", Plot: " + node.plot AS text,
score,
{
title: node.title,
genres: [ (node)-[:IN_GENRE]->(g) | g.name ],
actors: [ (person)-[r:ACTED_IN]->(node) | [person.name, r.role] ],
userRating: userRating
} AS metadata
ORDER BY userRating DESC
"""
# Create Vector
plot_vector = Neo4jVector.from_existing_index(
embedding_model,
graph=graph,
index_name="moviePlots",
embedding_node_property="plotEmbedding",
text_node_property="plot",
retrieval_query=retrieval_query,
)
# Define functions for each step in the application
# Retrieve context
def retrieve(state: State):
# Use the vector to find relevant documents
context = plot_vector.similarity_search(
state["question"],
k=6,
)
return {"context": context}
# Generate the answer based on the question and context
def generate(state: State):
messages = prompt.invoke({"question": state["question"], "context": state["context"]})
response = model.invoke(messages)
return {"answer": response.content}
# Define application steps
workflow = StateGraph(State).add_sequence([retrieve, generate])
workflow.add_edge(START, "retrieve")
app = workflow.compile()
# Run the application
question = "Who acts in movies about Love and Romance?"
response = app.invoke({"question": question})
print("Answer:", response["answer"])
print("Context:", response["context"])
Questions will generate a context that includes the movie plots, genres, actors, and user ratings. The agent will then use this context to generate a more accurate response.
[question] Who acts in movies about Love and Romance?
[answer] Audrey Hepburn, Gregory Peck, Christian Slater, Mary Stuart Masterson, Robert Redford, Michelle Pfeiffer, and Cary Grant act in movies about love and romance.
Run the application, review the additional context, and experiment with different questions, for example:
-
Who acts in movies about Love and Romance?
-
What are top user rated movies about a house haunted by ghosts?
-
What movies genres relate to movies about betrayal?
A GraphRAG retriever allows you to combine the power of vector search with the rich context of a graph database, enabling more accurate and context-aware responses.
Continue
When you are ready, continue to the next lesson.
Lesson Summary
In this lesson, you learned how to enhance a vector retriever with a Cypher retrieval query to create a GraphRAG retriever.
In the next challenge, you will explore add additional data to the retrieval query and test the agents ability to answer more complex questions.