Graph-Enhanced Vector Retriever

To take advantage of the relationships in the graph, you can create a retriever that uses both vector search and graph traversal to find relevant data.

The VectorCypherRetriever allows you to perform vector searches and then traverse the graph to find related nodes or entities.

Open the genai_fundamentals/vector_cypher_rag.py file and review the code:

python
vector_cypher_rag.py
import os
from dotenv import load_dotenv
load_dotenv()

from neo4j import GraphDatabase
from neo4j_graphrag.embeddings.openai import OpenAIEmbeddings
from neo4j_graphrag.llm import OpenAILLM
from neo4j_graphrag.generation import GraphRAG

# Connect to Neo4j database
driver = GraphDatabase.driver(
    os.getenv("NEO4J_URI"), 
    auth=(
        os.getenv("NEO4J_USERNAME"), 
        os.getenv("NEO4J_PASSWORD")
    )
)

# Create embedder
embedder = OpenAIEmbeddings(model="text-embedding-ada-002")

# Define retrieval query
retrieval_query =

# Create retriever
retriever = 

#  Create the LLM
llm = OpenAILLM(model_name="gpt-4o")

# Create GraphRAG pipeline
rag = GraphRAG(retriever=retriever, llm=llm)

# Search
query_text = "Find the highest rated action movie about travelling to other planets"

response = rag.search(
    query_text=query_text, 
    retriever_config={"top_k": 5},
    return_context=True
)

print(response.answer)
print("CONTEXT:", response.retriever_result.items)

# Close the database connection
driver.close()

The program includes all the code to connect to Neo4j, create the embedder, llm, and GraphRAG pipeline.

Your task is to:

  1. Configure the Cypher retrieval query that will traverse the graph

  2. Create the VectorCypherRetriever retriever.

Retrieval Query

The retrieval query is a Cypher query that will be used to get data from the graph after the nodes are returned by the vector search.

The query receives the node and score variables yielded by the vector search.

Add this retrieval query to the code:

python
# Define retrieval query
retrieval_query = """
MATCH (node)<-[r:RATED]-()
RETURN 
  node.title AS title, node.plot AS plot, score AS similarityScore, 
  collect { MATCH (node)-[:IN_GENRE]->(g) RETURN g.name } as genres, 
  collect { MATCH (node)<-[:ACTED_IN]->(a) RETURN a.name } as actors, 
  avg(r.rating) as userRating
ORDER BY userRating DESC
"""

The query traverses the graph to find related nodes for genres and actors, as well as sorting the results by the user rating.

Retriever

You can now use the VectorCypherRetriever class to create a retriever that will perform the vector search and then traverse the graph:

python
from neo4j_graphrag.retrievers import VectorCypherRetriever

# Create retriever
retriever = VectorCypherRetriever(
    driver,
    index_name="moviePlots",
    embedder=embedder,
    retrieval_query=retrieval_query,
)

The retriever requires the vector index name (moviePlots), the retrieval query, and the embedder to encode the query.

Click to view the complete code
python
import os
from dotenv import load_dotenv
load_dotenv()

from neo4j import GraphDatabase
from neo4j_graphrag.embeddings.openai import OpenAIEmbeddings
from neo4j_graphrag.retrievers import VectorCypherRetriever
from neo4j_graphrag.llm import OpenAILLM
from neo4j_graphrag.generation import GraphRAG

# Connect to Neo4j database
driver = GraphDatabase.driver(
    os.getenv("NEO4J_URI"), 
    auth=(
        os.getenv("NEO4J_USERNAME"), 
        os.getenv("NEO4J_PASSWORD")
    )
)

# Create embedder
embedder = OpenAIEmbeddings(model="text-embedding-ada-002")

# Define retrieval query
retrieval_query = """
MATCH (node)<-[r:RATED]-()
RETURN 
  node.title AS title, node.plot AS plot, score AS similarityScore, 
  collect { MATCH (node)-[:IN_GENRE]->(g) RETURN g.name } as genres, 
  collect { MATCH (node)<-[:ACTED_IN]->(a) RETURN a.name } as actors, 
  avg(r.rating) as userRating
ORDER BY userRating DESC
"""

# Create retriever
retriever = VectorCypherRetriever(
    driver,
    index_name="moviePlots",
    embedder=embedder,
    retrieval_query=retrieval_query,
)

#  Create the LLM
llm = OpenAILLM(model_name="gpt-4o")

# Create GraphRAG pipeline
rag = GraphRAG(retriever=retriever, llm=llm)

# Search
query_text = "Find the highest rated action movie about travelling to other planets"

response = rag.search(
    query_text=query_text, 
    retriever_config={"top_k": 5},
    return_context=True
)

print(response.answer)
print("CONTEXT:", response.retriever_result.items)

# Close the database connection
driver.close()

Context

When you run the code, it will complete a vector search for the provided query and then traverse the graph to find related nodes.

The additional context allows the LLM to generate more accurate responses based on the additional data in the graph.

Transparency

The context is returned after the response, allowing you to see what data was used to generate the respnse.

This transparency is important for understanding how the LLM arrived at its response and for debugging purposes.

When sent the query "Find the highest rated action movie about travelling to other planets", the GraphRAG pipeline will follow these steps:

  1. Perform a vector search for movie plots related to travelling to other planets.

  2. Run the retrieval query to find related actors, genres, and user ratings.

  3. Pass the retrieved data to the LLM to generate a response.

You can expect a response to be based on:

  • Travelling to other planets.

  • The comedy genre.

  • With the highest user rating (not the vector similarity score).

A typical response might be "The highest rated action movie about traveling to other planets is "Aliens," with a user rating of 3.92"

Test the code with different queries relating to movies, actors, and genres, such as:

  • Find a comedy movie about vampires

  • Who acts in drama movies about romance and love?

  • What genres are represented about movies where the hero fails his mission?

If you are having unpredictable responses try modifying the temperature of the LLM:

python
llm = OpenAILLM(
    model_name="gpt-4o",
    model_params={"temperature": 0.5}
)

Optional challenge

Modify the retrieval query to include the directors of the movies in the context.

Directors can be found using the pattern (node)←[:DIRECTED]-(director).

Try queries relating to directors, such as "Who has directed movies about weddings?"

Continue

When you are ready, continue to the next lesson.

Lesson Summary

In this lesson, you learned how to create a retriever that uses both vector search and graph traversal to find relevant data

In the next lesson, you will create a text to cypher retriever to find relevant data based on natural language queries.