What is a Retriever

Introduction

You are going to explore how to retrieve information from the knowledge graph using retrievers included in the Neo4j GraphRAG Python package.

This lesson covers the three main retriever types you’ll work with in the hands-on notebook.

What is a Retriever?

A retriever is a component that searches and returns relevant information from your knowledge graph to answer questions or provide context to language models.

The Three Types:

Vector Retriever: Semantic search across text chunks
Vector + Cypher Retriever: Semantic search + graph traversal
Text2Cypher Retriever: Natural language to Cypher queries

Vector Retriever

How it works:

Converts your question into a vector embedding using an embedding model (eg. OpenAI)
Searches the chunkEmbeddings vector index for similar content
Returns semantically related text chunks based on cosine similarity
Pure semantic search - no graph traversal

Example from the notebook:

python

vector_retriever = VectorRetriever(
    driver=driver,
    index_name='chunkEmbeddings',
    embedder=embedder,
    return_properties=['text']
)

Vector Retriever

Best for:

Finding conceptually similar information across all documents
Semantic search when exact keywords don’t match
Broad exploratory questions about topics
When you don’t know specific entity names

Example Query: "What are the risks that Apple faces?"

Limitations:

Returns only text chunks, no entity relationships
May miss entity-specific context
Cannot aggregate information across multiple entities

Vector + Cypher Retriever

How it works:

Step 1: Vector search finds semantically relevant text chunks
Step 2: Custom Cypher query traverses from each chunk to related entities
Step 3: Returns enriched context including entities, relationships, and metadata
Combines semantic relevance with graph intelligence

Example from the notebook:

python

vector_cypher_retriever = VectorCypherRetriever(
    driver=driver,
    index_name='chunkEmbeddings',
    embedder=embedder,
    retrieval_query=custom_cypher_query
)

Vector + Cypher Retriever

Best for:

Getting both content and rich contextual information
Understanding relationships between entities mentioned in chunks
Questions requiring entity-specific aggregations
Comprehensive answers that need multiple connected data points

Example Query: "Which asset managers are most affected by cryptocurrency policies?"

Why "Apple" Queries Can Fail in Vector + Cypher

The Challenge: When you ask "What are the risks that Apple faces?" using Vector + Cypher, you may not get Apple-specific results.

Why this happens:

Vector search finds chunks semantically similar to your query
If those chunks aren’t about Apple, the Cypher query won’t reach Apple entities
The chunk is the anchor - you can only traverse from what you retrieve

Key Insight: Vector + Cypher works best when your question naturally surfaces relevant chunks about the entities you’re interested in.

Good Vector + Cypher Query Example

Query: "Which asset managers are most affected by banking regulations?"

Why this works well:

Vector search finds chunks about "banking regulations"
Cypher query traverses to asset managers connected to those companies
Returns both the regulatory context AND the asset manager entities

Cypher pattern:

cypher

WITH node
MATCH (node)-[:FROM_DOCUMENT]-(doc:Document)-[:FILED]-(company:Company)-[:OWNS]-(manager:AssetManager)
RETURN company.name AS company, manager.managerName AS AssetManager, node.text AS context

Text2Cypher Retriever

How it works:

Uses an LLM to convert natural language questions into Cypher queries
Leverages the graph schema to understand available entities and relationships
Executes the generated Cypher query directly against Neo4j
Returns structured, precise results from the graph

Example from the notebook:

python

text2cypher_retriever = Text2CypherRetriever(
    driver=driver,
    llm=llm,
    neo4j_schema=schema
)

Text2Cypher Retriever

Example Query: "What are the company names of companies owned by BlackRock Inc?"

Generated Cypher:

cypher

MATCH (am:AssetManager {managerName: 'BlackRock Inc.'})-[:OWNS]->(c:Company)
RETURN c.name AS company_name

Text2Cypher Retriever

Best for:

Precise, entity-centric questions
When you need exact data (numbers, dates, counts, names)
Aggregations and analytical questions
Direct graph queries without semantic search

Limitations:

Requires good graph schema understanding
May struggle with ambiguous natural language
Less effective for open-ended or exploratory questions

Choosing the Right Retriever

Use Vector Retriever when:

You want semantic similarity search
Question is conceptual or broad
You need to find related topics

Use Vector + Cypher when:

You want both content and relationships
Need comprehensive context
Question involves multiple entities

Use Text2Cypher when:

You need precise, structured data
Question asks for specific facts or numbers
You want to leverage graph relationships directly

Summary

In this lesson, you learned about the three main types of retrievers:

Vector Retriever for semantic similarity search
Vector + Cypher Retriever for hybrid content and relationship search
Text2Cypher Retriever for structured graph queries

Each retriever has specific strengths and use cases, and understanding when to use each one is key to building effective RAG applications.

In the next lesson, you will work with these retrievers hands-on in a Jupyter notebook.

Neo4j & Generative AI Hands-On Workshop

Generative AI

Retrievers

Agents

What is a Retriever

Introduction

What is a Retriever?

Vector Retriever

Vector Retriever

Vector + Cypher Retriever

Vector + Cypher Retriever

Why "Apple" Queries Can Fail in Vector + Cypher

Good Vector + Cypher Query Example

Text2Cypher Retriever

Text2Cypher Retriever

Text2Cypher Retriever

Choosing the Right Retriever

Summary

Chatbot