Hybrid Retrieval with Graph Traversal

Understanding HybridCypherRetriever

The HybridCypherRetriever enhances the retrieval process by combining hybrid search (vector and full-text) with graph traversal techniques. This allows you to retrieve not only semantically similar nodes but also related information through graph relationships, enabling more comprehensive and accurate responses in your GraphRAG applications.

Try this Cypher command in sandbox:

cypher
MATCH (m:Movie {title: "Musa the Warrior (Musa)"})
MATCH (actor:Actor)-[:ACTED_IN]->(m)
RETURN m, collect(actor) AS actors;

How It Works

HybridCypherRetriever
  • Hybrid Search:

    • Combines vector similarity and full-text search to find relevant nodes.

  • Graph Traversal:

    • Uses Cypher queries to fetch additional related nodes based on the initial retrieval.

  • Aggregation:

    • Merges results from both search methods and traversal to provide enriched data for the language model.

When to Use HybridCypherRetriever

  • Complex Queries:

    • When user queries require both semantic understanding and specific relationship-based information.

  • Rich Data Relationships:

    • Your graph contains interconnected data where related nodes hold valuable context.

  • Enhanced Accuracy:

    • Aim to retrieve precise information by leveraging both search methods and graph structure.

Setting Up HybridCypherRetriever

Follow these steps to set up and use the HybridCypherRetriever.

1. Initialize the Embedder

Create the embedding function using OpenAI’s model:

python
from neo4j_graphrag.embeddings.openai import OpenAIEmbeddings

embedder = OpenAIEmbeddings(model="text-embedding-ada-002")

2. Initialize the HybridCypherRetriever

Set up the HybridCypherRetriever with your Neo4j database and embedding model:

python
from neo4j_graphrag.retrievers import HybridCypherRetriever
from neo4j_graphrag.llm import OpenAILLM

retrieval_query = """
MATCH (actor:Actor)-[:ACTED_IN]->(node:Movie)
RETURN node.title AS movie_title,
       node.plot AS movie_plot,
       collect(actor.name) AS actors;
"""

retriever = HybridCypherRetriever(
    driver=driver,
    vector_index_name="moviePlots",
    fulltext_index_name="plotFulltext",
    retrieval_query=retrieval_query,
    embedder=embedder,
)

3. Using the Retriever

Use the HybridCypherRetriever as part of a GraphRAG pipeline to perform hybrid searches within your Neo4j database:

python
from neo4j_graphrag.generation import GraphRAG

llm = OpenAILLM(model_name="gpt-4o", model_params={"temperature": 0})
rag = GraphRAG(retriever=retriever, llm=llm)
query_text = "What are the names of the actors in the movie set in 1375 in Imperial China?"
response = rag.search(query_text=query_text, retriever_config={"top_k": 5})
print(response.answer)

Expected Output

The names of the actors in the movie set in 1375 in Imperial China, "Musa the Warrior (Musa)," are Irrfan Khan, Ziyi Zhang, Sung-kee Ahn, and Jin-mo Ju.

Tips for Effective Use

  • Consistent Embeddings:

    • Use the same model for both query and node embeddings to ensure compatibility.

  • Build Effective Fulltext Indexes:

    • Create full-text indexes on relevant properties to enhance keyword search capabilities.

  • Leverage Fulltext Indexes:

    • If you can leverage your full-text indexes effectively, the HybridRetriever becomes more useful by combining semantic and keyword-based search results.

  • Leverage Cypher Proficiency:

    • The node variable is provided in the Cypher query, so leveraging your Cypher skills can maximize the effectiveness of this retriever by crafting more precise and efficient queries.

Continue

When you are ready, you can move on to the next task.

Summary

You’ve learned how to use HybridCypherRetriever to perform advanced filtered semantic searches in Neo4j, enhancing your RAG pipeline by combining hybrid search methods with graph traversal techniques. This enables your applications to handle more complex queries and retrieve comprehensive information.