Graph Retrieval

You can enhance a vector retriever using GraphRAG to include additional context.

In this lesson, you will update the vector retriever to retrieve additional metadata from the graph after the similarity search.

GraphRAG

You can add an additional Cypher retrieval query to the Neo4jVector class. The retrieval query is run after the similarity search and the data it returns is added to the Document metadata.

You can use this retrieval query to retrieve useful context from the graph.

In the movie plot example, you could retrieve additional information about the movies, such as the actors or user ratings.

The additional context can be used to improve and expand the agent’s responses, for example:

Who acts in movies about Love and Romance?

The vector retriever will return movies about Love and Romance, the Cypher retrieval query will return the actors in those movies, and the agent can use this information to answer the question.

This method of vector + graph retrieval is a common approach to GraphRAG (Graph Retrieval Augmented Generation).

Retrieval Query

Open the genai-integration-langchain/vector_graph_retriever.py file:

python

vector_graph_retriever.py

import os
from dotenv import load_dotenv
load_dotenv()

from langchain_core.documents import Document
from langchain.chat_models import init_chat_model
from langgraph.graph import START, StateGraph
from langchain_core.prompts import PromptTemplate
from typing_extensions import List, TypedDict
from langchain_openai import OpenAIEmbeddings
from langchain_neo4j import Neo4jGraph, Neo4jVector

# Initialize the LLM
model = init_chat_model("gpt-4o", model_provider="openai")

# Create a prompt
template = """Use the following pieces of context to answer the question at the end.
If you don't know the answer, just say that you don't know, don't try to make up an answer.

{context}

Question: {question}

Answer:"""

prompt = PromptTemplate.from_template(template)

# Define state for application
class State(TypedDict):
    question: str
    context: List[Document]
    answer: str

# Connect to Neo4j
graph = Neo4jGraph(
    url=os.getenv("NEO4J_URI"),
    username=os.getenv("NEO4J_USERNAME"), 
    password=os.getenv("NEO4J_PASSWORD"),
)

# Create the embedding model
embedding_model = OpenAIEmbeddings(model="text-embedding-ada-002")

# Define the retrieval query
# retrieval_query = 

# Create Vector
plot_vector = Neo4jVector.from_existing_index(
    embedding_model,
    graph=graph,
    index_name="moviePlots",
    embedding_node_property="plotEmbedding",
    text_node_property="plot",
)

# Define functions for each step in the application

# Retrieve context 
def retrieve(state: State):
    # Use the vector to find relevant documents
    context = plot_vector.similarity_search(
        state["question"], 
        k=6,
    )
    return {"context": context}

# Generate the answer based on the question and context
def generate(state: State):
    messages = prompt.invoke({"question": state["question"], "context": state["context"]})
    response = model.invoke(messages)
    return {"answer": response.content}

# Define application steps
workflow = StateGraph(State).add_sequence([retrieve, generate])
workflow.add_edge(START, "retrieve")
app = workflow.compile()

# Run the application
question = "Who acts in movies about Love and Romance?"
response = app.invoke({"question": question})
print("Answer:", response["answer"])
print("Context:", response["context"])

This is the same code as the vector retriever agent you created.

You need to define a retrieval_query that will be used to supplement the results of the similarity search.

python

# Define the retrieval query
retrieval_query = """
MATCH (node)<-[r:RATED]-()
WITH node, score, avg(r.rating) AS userRating
RETURN 
    "Title: " + node.title + ", Plot: " + node.plot AS text, 
    score, 
    {
        title: node.title,
        genres: [ (node)-[:IN_GENRE]->(g) | g.name ],
        actors: [ (person)-[r:ACTED_IN]->(node) | [person.name, r.role] ],
        userRating: userRating
    } AS metadata
ORDER BY userRating DESC
"""

The query receives the node and score variables yielded by the vector search.

The query traverses the graph to find related nodes for genres and actors, as well as sorting the results by the user rating.

Vector Store

You can now update the Neo4jVector to use the retrieval_query:

python

# Create Vector
plot_vector = Neo4jVector.from_existing_index(
    embedding_model,
    graph=graph,
    index_name="moviePlots",
    embedding_node_property="plotEmbedding",
    text_node_property="plot",
    retrieval_query=retrieval_query,
)

The retrieve function will add the additional context to the agent when the similarity_search method is used.

Click to view the complete code

python

import os
from dotenv import load_dotenv
load_dotenv()

from langchain_core.documents import Document
from langchain.chat_models import init_chat_model
from langgraph.graph import START, StateGraph
from langchain_core.prompts import PromptTemplate
from typing_extensions import List, TypedDict
from langchain_openai import OpenAIEmbeddings
from langchain_neo4j import Neo4jGraph, Neo4jVector

# Initialize the LLM
model = init_chat_model("gpt-4o", model_provider="openai")

# Create a prompt
template = """Use the following pieces of context to answer the question at the end.
If you don't know the answer, just say that you don't know, don't try to make up an answer.

{context}

Question: {question}

Answer:"""

prompt = PromptTemplate.from_template(template)

# Define state for application
class State(TypedDict):
    question: str
    context: List[Document]
    answer: str

# Connect to Neo4j
graph = Neo4jGraph(
    url=os.getenv("NEO4J_URI"),
    username=os.getenv("NEO4J_USERNAME"), 
    password=os.getenv("NEO4J_PASSWORD"),
)

# Create the embedding model
embedding_model = OpenAIEmbeddings(model="text-embedding-ada-002")

# Define the retrieval query
retrieval_query = """
MATCH (node)<-[r:RATED]-()
WITH node, score, avg(r.rating) AS userRating
RETURN 
    "Title: " + node.title + ", Plot: " + node.plot AS text, 
    score, 
    {
        title: node.title,
        genres: [ (node)-[:IN_GENRE]->(g) | g.name ],
        actors: [ (person)-[r:ACTED_IN]->(node) | [person.name, r.role] ],
        userRating: userRating
    } AS metadata
ORDER BY userRating DESC
"""

# Create Vector
plot_vector = Neo4jVector.from_existing_index(
    embedding_model,
    graph=graph,
    index_name="moviePlots",
    embedding_node_property="plotEmbedding",
    text_node_property="plot",
    retrieval_query=retrieval_query,
)

# Define functions for each step in the application

# Retrieve context 
def retrieve(state: State):
    # Use the vector to find relevant documents
    context = plot_vector.similarity_search(
        state["question"], 
        k=6,
    )
    return {"context": context}

# Generate the answer based on the question and context
def generate(state: State):
    messages = prompt.invoke({"question": state["question"], "context": state["context"]})
    response = model.invoke(messages)
    return {"answer": response.content}

# Define application steps
workflow = StateGraph(State).add_sequence([retrieve, generate])
workflow.add_edge(START, "retrieve")
app = workflow.compile()

# Run the application
question = "Who acts in movies about Love and Romance?"
response = app.invoke({"question": question})
print("Answer:", response["answer"])
print("Context:", response["context"])

Questions will generate a context that includes the movie plots, genres, actors, and user ratings. The agent will then use this context to generate a more accurate response.

[question]
Who acts in movies about Love and Romance?

[answer]
Audrey Hepburn, Gregory Peck, Christian Slater, Mary Stuart Masterson,
Robert Redford, Michelle Pfeiffer, and Cary Grant act in movies about
love and romance.

Run the application, review the additional context, and experiment with different questions, for example:

Who acts in movies about Love and Romance?
What are top user rated movies about a house haunted by ghosts?
What movies genres relate to movies about betrayal?

A GraphRAG retriever allows you to combine the power of vector search with the rich context of a graph database, enabling more accurate and context-aware responses.

Continue

When you are ready, continue to the next lesson.

Lesson Summary

In this lesson, you learned how to enhance a vector retriever with a Cypher retrieval query to create a GraphRAG retriever.

In the next challenge, you will explore add additional data to the retrieval query and test the agents ability to answer more complex questions.

Using Neo4j with LangChain

Neo4j and LangChain

Vectors

Text to Cypher

Graph Retrieval

GraphRAG

Retrieval Query

Vector Store

Continue

Lesson Summary

Chatbot