Introduction
You are going to explore how to retrieve information from the knowledge graph using retrievers.
This lesson covers the three main retriever types you’ll work with in the hands-on notebook.
What is a Retriever?
A retriever is a component that searches and returns relevant information from your knowledge graph to answer questions or provide context to language models.
The Three Types:
-
Vector Retriever: Semantic search across text chunks
-
Vector + Cypher Retriever: Semantic search + graph traversal
-
Text2Cypher Retriever: Natural language to Cypher queries
Vector Retriever
How it works:
-
Converts your question into a vector embedding using OpenAI
-
Searches the
chunkEmbeddings
vector index for similar content -
Returns semantically related text chunks based on cosine similarity
-
Pure semantic search - no graph traversal
Example from the notebook:
vector_retriever = VectorRetriever(
driver=driver,
index_name='chunkEmbeddings',
embedder=embedder,
return_properties=['text']
)
Vector Retriever
Best for:
-
Finding conceptually similar information across all documents
-
Semantic search when exact keywords don’t match
-
Broad exploratory questions about topics
-
When you don’t know specific entity names
Example Query: "What are the risks that Apple faces?"
Limitations:
-
Returns only text chunks, no entity relationships
-
May miss entity-specific context
-
Cannot aggregate information across multiple entities
Vector + Cypher Retriever
How it works:
-
Step 1: Vector search finds semantically relevant text chunks
-
Step 2: Custom Cypher query traverses from each chunk to related entities
-
Step 3: Returns enriched context including entities, relationships, and metadata
-
Combines semantic relevance with graph intelligence
Example from the notebook:
vector_cypher_retriever = VectorCypherRetriever(
driver=driver,
index_name='chunkEmbeddings',
embedder=embedder,
retrieval_query=custom_cypher_query
)
Vector + Cypher Retriever
Best for:
-
Getting both content and rich contextual information
-
Understanding relationships between entities mentioned in chunks
-
Questions requiring entity-specific aggregations
-
Comprehensive answers that need multiple connected data points
Example Query: "Which asset managers are most affected by cryptocurrency policies?"
Why "Apple" Queries Can Fail in Vector-Cypher
The Challenge: When you ask "What are the risks that Apple faces?" using Vector + Cypher, you may not get Apple-specific results.
Why this happens:
-
Vector search finds chunks semantically similar to your query
-
If those chunks aren’t about Apple, the Cypher query won’t reach Apple entities
-
The chunk is the anchor - you can only traverse from what you retrieve
Key Insight: Vector-Cypher works best when your question naturally surfaces relevant chunks about the entities you’re interested in.
Good Vector-Cypher Query Example
Query: "Which asset managers are most affected by banking regulations?"
Why this works well:
-
Vector search finds chunks about "banking regulations"
-
Cypher query traverses to asset managers connected to those companies
-
Returns both the regulatory context AND the asset manager entities
Cypher pattern:
WITH node
MATCH (node)-[:FROM_DOCUMENT]-(doc:Document)-[:FILED]-(company:Company)-[:OWNS]-(manager:AssetManager)
RETURN company.name AS company, manager.managerName AS AssetManager, node.text AS context
Text2Cypher Retriever
How it works:
-
Uses an LLM to convert natural language questions into Cypher queries
-
Leverages the graph schema to understand available entities and relationships
-
Executes the generated Cypher query directly against Neo4j
-
Returns structured, precise results from the graph
Example from the notebook:
text2cypher_retriever = Text2CypherRetriever(
driver=driver,
llm=llm,
neo4j_schema=schema
)
Text2Cypher Retriever
Example Query: "What are the company names of companies owned by BlackRock Inc?"
Generated Cypher:
MATCH (am:AssetManager {managerName: 'BlackRock Inc.'})-[:OWNS]->(c:Company)
RETURN c.name AS company_name
Text2Cypher Retriever
Best for:
-
Precise, entity-centric questions
-
When you need exact data (numbers, dates, counts, names)
-
Aggregations and analytical questions
-
Direct graph queries without semantic search
Limitations:
-
Requires good graph schema understanding
-
May struggle with ambiguous natural language
-
Less effective for open-ended or exploratory questions
Choosing the Right Retriever
Use Vector Retriever when:
-
You want semantic similarity search
-
Question is conceptual or broad
-
You need to find related topics
Use Vector + Cypher when:
-
You want both content and relationships
-
Need comprehensive context
-
Question involves multiple entities
Use Text2Cypher when:
-
You need precise, structured data
-
Question asks for specific facts or numbers
-
You want to leverage graph relationships directly
Summary
In this lesson, you learned about the three main types of retrievers:
-
Vector Retriever for semantic similarity search
-
Vector + Cypher Retriever for hybrid content and relationship search
-
Text2Cypher Retriever for structured graph queries
Each retriever has specific strengths and use cases, and understanding when to use each one is key to building effective RAG applications.
In the next lesson, you will work with these retrievers hands-on in a Jupyter notebook.