Introduction
In this lesson, you will learn about the three main types of retriever that are available in the Neo4j GraphRAG library.
What is a Retriever?
A retriever is a component that searches and returns relevant information from your knowledge graph to answer questions or provide context to language models.
The Three Types:
-
Vector Retriever: Semantic search across text chunks
-
Vector + Cypher Retriever: Semantic search + graph traversal
-
Text2Cypher Retriever: Natural language to Cypher queries
Vector Retriever
How it works:
-
Converts your question into a vector embedding using the
embedder
-
Searches the
chunkEmbeddings
vector index for similar content -
Returns semantically related text chunks based on cosine similarity
-
Pure semantic search - no graph traversal
from neo4j_graphrag.retrievers import VectorRetriever
vector_retriever = VectorRetriever(
driver=driver,
index_name='chunkEmbeddings',
embedder=embedder,
return_properties=['text']
)
Vector Retriever
Best for:
-
Finding conceptually similar information across all documents
-
Semantic search when exact keywords don’t match
-
Broad exploratory questions about topics
-
When you don’t know specific entity names
Example Query: "What are the risks that Apple faces?"
Limitations:
-
Returns only text chunks, no entity relationships
-
May miss entity-specific context
-
Cannot aggregate information across multiple entities
Vector + Cypher Retriever
How it works:
-
Step 1: Vector search finds semantically relevant text chunks
-
Step 2: Custom Cypher query traverses from each chunk to related entities
-
Step 3: Returns enriched context including entities, relationships, and metadata
-
Combines semantic relevance with graph intelligence
from neo4j_graphrag.retrievers import VectorCypherRetriever
detailed_context_query = """
WITH node
MATCH (node)-[:FROM_DOCUMENT]-(doc:Document)-[:FILED]-(company:Company)-[:FACES_RISK]->(risk:RiskFactor)
RETURN company.name AS company, node.text AS context, collect(DISTINCT risk.name) AS risks
"""
vector_cypher_retriever = VectorCypherRetriever(
driver=driver,
index_name='chunkEmbeddings',
embedder=embedder,
retrieval_query=detailed_context_query
)
Vector + Cypher Retriever
Best for:
-
Getting both content and rich contextual information
-
Understanding relationships between entities mentioned in chunks
-
Questions requiring entity-specific aggregations
-
Comprehensive answers that need multiple connected data points
Example Query: "Which asset managers are most affected by cryptocurrency policies?"
Why "Apple" Queries Can Fail in Vector + Cypher
The Challenge:
When you ask "What are the risks that Apple faces?" using Vector + Cypher, you may not get Apple-specific results.
Why this happens:
-
Vector search finds chunks semantically similar to your query
-
If those chunks aren’t about Apple, the Cypher query won’t reach Apple entities
-
The chunk is the anchor - you can only traverse from what you retrieve
Key Insight:
Vector + Cypher works best when your question naturally surfaces relevant chunks about the entities you’re interested in.
Good Vector + Cypher Query Example
Query: "Which asset managers are most affected by banking regulations?"
Why this works well:
-
Vector search finds chunks about "banking regulations"
-
Cypher query traverses to asset managers connected to those companies
-
Returns both the regulatory context AND the asset manager entities
Cypher pattern:
WITH node
MATCH (node)-[:FROM_DOCUMENT]-(doc:Document)-[:FILED]-(company:Company)-[:OWNS]-(manager:AssetManager)
RETURN company.name AS company, manager.managerName AS AssetManager, node.text AS context
Text2Cypher Retriever
How it works:
-
Uses an LLM to convert natural language questions into Cypher queries
-
Leverages the graph schema to understand available entities and relationships
-
Executes the generated Cypher query directly against Neo4j
-
Returns structured, precise results from the graph
from neo4j_graphrag.retrievers import Text2CypherRetriever
text2cypher_retriever = Text2CypherRetriever(
driver=driver,
llm=llm,
neo4j_schema=schema
)
Text2Cypher Retriever
Example Query: "What are the company names of companies owned by Berkshire Hathaway Inc?"
Generated Cypher:
MATCH (am:AssetManager {managerName: 'Berkshire Hathaway Inc'})-[:OWNS]->(c:Company)
RETURN c.name AS company_name
Text2Cypher Retriever
Best for:
-
Precise, entity-centric questions
-
When you need exact data (numbers, dates, counts, names)
-
Aggregations and analytical questions
-
Direct graph queries without semantic search
Limitations:
-
Requires good graph schema understanding
-
May struggle with ambiguous natural language
-
Less effective for open-ended or exploratory questions
Choosing the Right Retriever
Use Vector Retriever when:
-
You want semantic similarity search
-
Question is conceptual or broad
-
You need to find related topics
Use Vector + Cypher when:
-
You want both content and relationships
-
Need comprehensive context
-
Question involves multiple entities
Use Text2Cypher when:
-
You need precise, structured data
-
Question asks for specific facts or numbers
-
You want to leverage graph relationships directly
Try it yourself
In the next lessons, we will work through the notebooks and see how to use the retrievers in practice.
Summary
In this lesson, you learned about the three main types of retrievers:
-
Vector Retriever for semantic similarity search
-
Vector + Cypher Retriever for hybrid content and relationship search
-
Text2Cypher Retriever for structured graph queries
Each retriever has specific strengths and use cases, and understanding when to use each one is key to building effective RAG applications.
In the next lesson, you will work with these retrievers hands-on in a Jupyter notebook.