What is a Retriever

Introduction

You are going to explore how to retrieve information from the knowledge graph using retrievers.

This lesson covers the three main retriever types you’ll work with in the hands-on notebook.

What is a Retriever?

A retriever is a component that searches and returns relevant information from your knowledge graph to answer questions or provide context to language models.

The Three Types:

  • Vector Retriever: Semantic search across text chunks

  • Vector + Cypher Retriever: Semantic search + graph traversal

  • Text2Cypher Retriever: Natural language to Cypher queries

Vector Retriever

How it works:

  • Converts your question into a vector embedding using OpenAI

  • Searches the chunkEmbeddings vector index for similar content

  • Returns semantically related text chunks based on cosine similarity

  • Pure semantic search - no graph traversal

Example from the notebook:

python
vector_retriever = VectorRetriever(
    driver=driver,
    index_name='chunkEmbeddings',
    embedder=embedder,
    return_properties=['text']
)

Vector Retriever

Best for:

  • Finding conceptually similar information across all documents

  • Semantic search when exact keywords don’t match

  • Broad exploratory questions about topics

  • When you don’t know specific entity names

Example Query: "What are the risks that Apple faces?"

Limitations:

  • Returns only text chunks, no entity relationships

  • May miss entity-specific context

  • Cannot aggregate information across multiple entities

Vector + Cypher Retriever

How it works:

  • Step 1: Vector search finds semantically relevant text chunks

  • Step 2: Custom Cypher query traverses from each chunk to related entities

  • Step 3: Returns enriched context including entities, relationships, and metadata

  • Combines semantic relevance with graph intelligence

Example from the notebook:

python
vector_cypher_retriever = VectorCypherRetriever(
    driver=driver,
    index_name='chunkEmbeddings',
    embedder=embedder,
    retrieval_query=custom_cypher_query
)

Vector + Cypher Retriever

Best for:

  • Getting both content and rich contextual information

  • Understanding relationships between entities mentioned in chunks

  • Questions requiring entity-specific aggregations

  • Comprehensive answers that need multiple connected data points

Example Query: "Which asset managers are most affected by cryptocurrency policies?"

Why "Apple" Queries Can Fail in Vector-Cypher

The Challenge: When you ask "What are the risks that Apple faces?" using Vector + Cypher, you may not get Apple-specific results.

Why this happens:

  • Vector search finds chunks semantically similar to your query

  • If those chunks aren’t about Apple, the Cypher query won’t reach Apple entities

  • The chunk is the anchor - you can only traverse from what you retrieve

Key Insight: Vector-Cypher works best when your question naturally surfaces relevant chunks about the entities you’re interested in.

Good Vector-Cypher Query Example

Query: "Which asset managers are most affected by banking regulations?"

Why this works well:

  • Vector search finds chunks about "banking regulations"

  • Cypher query traverses to asset managers connected to those companies

  • Returns both the regulatory context AND the asset manager entities

Cypher pattern:

cypher
WITH node
MATCH (node)-[:FROM_DOCUMENT]-(doc:Document)-[:FILED]-(company:Company)-[:OWNS]-(manager:AssetManager)
RETURN company.name AS company, manager.managerName AS AssetManager, node.text AS context

Text2Cypher Retriever

How it works:

  • Uses an LLM to convert natural language questions into Cypher queries

  • Leverages the graph schema to understand available entities and relationships

  • Executes the generated Cypher query directly against Neo4j

  • Returns structured, precise results from the graph

Example from the notebook:

python
text2cypher_retriever = Text2CypherRetriever(
    driver=driver,
    llm=llm,
    neo4j_schema=schema
)

Text2Cypher Retriever

Example Query: "What are the company names of companies owned by BlackRock Inc?"

Generated Cypher:

cypher
MATCH (am:AssetManager {managerName: 'BlackRock Inc.'})-[:OWNS]->(c:Company)
RETURN c.name AS company_name

Text2Cypher Retriever

Best for:

  • Precise, entity-centric questions

  • When you need exact data (numbers, dates, counts, names)

  • Aggregations and analytical questions

  • Direct graph queries without semantic search

Limitations:

  • Requires good graph schema understanding

  • May struggle with ambiguous natural language

  • Less effective for open-ended or exploratory questions

Choosing the Right Retriever

Use Vector Retriever when:

  • You want semantic similarity search

  • Question is conceptual or broad

  • You need to find related topics

Use Vector + Cypher when:

  • You want both content and relationships

  • Need comprehensive context

  • Question involves multiple entities

Use Text2Cypher when:

  • You need precise, structured data

  • Question asks for specific facts or numbers

  • You want to leverage graph relationships directly

Summary

In this lesson, you learned about the three main types of retrievers:

  • Vector Retriever for semantic similarity search

  • Vector + Cypher Retriever for hybrid content and relationship search

  • Text2Cypher Retriever for structured graph queries

Each retriever has specific strengths and use cases, and understanding when to use each one is key to building effective RAG applications.

In the next lesson, you will work with these retrievers hands-on in a Jupyter notebook.

Chatbot

How can I help you today?