Understanding HybridCypherRetriever
The HybridCypherRetriever
enhances the retrieval process by combining hybrid search (vector and full-text) with graph traversal techniques. This allows you to retrieve not only semantically similar nodes but also related information through graph relationships, enabling more comprehensive and accurate responses in your GraphRAG applications.
Try this Cypher command in sandbox:
MATCH (m:Movie {title: "Musa the Warrior (Musa)"})
MATCH (actor:Actor)-[:ACTED_IN]->(m)
RETURN m, collect(actor) AS actors;
How It Works
-
Hybrid Search:
-
Combines vector similarity and full-text search to find relevant nodes.
-
-
Graph Traversal:
-
Uses Cypher queries to fetch additional related nodes based on the initial retrieval.
-
-
Aggregation:
-
Merges results from both search methods and traversal to provide enriched data for the language model.
-
When to Use HybridCypherRetriever
-
Complex Queries:
-
When user queries require both semantic understanding and specific relationship-based information.
-
-
Rich Data Relationships:
-
Your graph contains interconnected data where related nodes hold valuable context.
-
-
Enhanced Accuracy:
-
Aim to retrieve precise information by leveraging both search methods and graph structure.
-
Setting Up HybridCypherRetriever
Follow these steps to set up and use the HybridCypherRetriever
.
Open the 2-neo4j-graphrag\hybrid_cypher_retriever.py
file in your code editor.
1. Initialize the Embedder
Create the embedding function using OpenAI’s model:
from neo4j_graphrag.embeddings.openai import OpenAIEmbeddings
embedder = OpenAIEmbeddings(model="text-embedding-ada-002")
2. Initialize the HybridCypherRetriever
Set up the HybridCypherRetriever
with your Neo4j database and embedding model:
from neo4j_graphrag.retrievers import HybridCypherRetriever
from neo4j_graphrag.llm import OpenAILLM
retrieval_query = """
MATCH (actor:Actor)-[:ACTED_IN]->(node:Movie)
RETURN node.title AS movie_title,
node.plot AS movie_plot,
collect(actor.name) AS actors;
"""
retriever = HybridCypherRetriever(
driver=driver,
vector_index_name="moviePlots",
fulltext_index_name="plotFulltext",
retrieval_query=retrieval_query,
embedder=embedder,
)
3. Using the Retriever
Use the HybridCypherRetriever
as part of a GraphRAG pipeline to perform hybrid searches within your Neo4j database:
from neo4j_graphrag.generation import GraphRAG
llm = OpenAILLM(model_name="gpt-4o", model_params={"temperature": 0})
rag = GraphRAG(retriever=retriever, llm=llm)
query_text = "What are the names of the actors in the movie set in 1375 in Imperial China?"
response = rag.search(query_text=query_text, retriever_config={"top_k": 5})
print(response.answer)
Expected Output
The names of the actors in the movie set in 1375 in Imperial China, "Musa the Warrior (Musa)," are Irrfan Khan, Ziyi Zhang, Sung-kee Ahn, and Jin-mo Ju.
Tips for Effective Use
-
Consistent Embeddings:
-
Use the same model for both query and node embeddings to ensure compatibility.
-
-
Build Effective Fulltext Indexes:
-
Create full-text indexes on relevant properties to enhance keyword search capabilities.
-
-
Leverage Fulltext Indexes:
-
If you can leverage your full-text indexes effectively, the HybridRetriever becomes more useful by combining semantic and keyword-based search results.
-
-
Leverage Cypher Proficiency:
-
The node variable is provided in the Cypher query, so leveraging your Cypher skills can maximize the effectiveness of this retriever by crafting more precise and efficient queries.
-
Continue
When you are ready, you can move on to the next task.
Summary
You’ve learned how to use HybridCypherRetriever
to perform advanced filtered semantic searches in Neo4j, enhancing your RAG pipeline by combining hybrid search methods with graph traversal techniques. This enables your applications to handle more complex queries and retrieve comprehensive information.