Text to Cypher Retriever

Vector and full text retrievers are great for finding relevant data based on semantic similarity or keyword matching.

To answer more specific questions, you may need to perform more complex queries to find data relating to specific nodes, relationships, or properties.

For example, you want to find:

  • The age of an actor.

  • Who acted in a movie.

  • Movie recommendations based on rating.

Text to Cypher retrievers allow you to convert natural language queries into Cypher queries that can be executed against the graph.

[User Query]
"What year was the movie Babe released?"
[Generated Cypher Query]
MATCH (m:Movie)
WHERE m.title = 'Babe'
RETURN m.released
[Cypher Result]
1995
[LLM Response]
"The movie Babe was released in 1995."

In this lesson, you will create a text to Cypher retriever that will generate context based on the user’s query and the graph schema.

Retriever

Open the genai_fundamentals/text2cypher_rag.py file and review the code:

python
text2cypher_rag.py
import os
from dotenv import load_dotenv
load_dotenv()

from neo4j import GraphDatabase
from neo4j_graphrag.llm import OpenAILLM
from neo4j_graphrag.generation import GraphRAG

# Connect to Neo4j database
driver = GraphDatabase.driver(
    os.getenv("NEO4J_URI"), 
    auth=(
        os.getenv("NEO4J_USERNAME"), 
        os.getenv("NEO4J_PASSWORD")
    )
)

# Create LLM 
t2c_llm =


# Build the retriever
retriever = 

llm = OpenAILLM(model_name="gpt-4o")
rag = GraphRAG(retriever=retriever, llm=llm)

query_text = "Which movies did Hugo Weaving star in?"

response = rag.search(
    query_text=query_text,
    return_context=True
    )

print(response.answer)
print("CYPHER :", response.retriever_result.metadata["cypher"])
print("CONTEXT:", response.retriever_result.items)

driver.close()

The programs includes all the code to connect to Neo4j, create the embedder, llm, and GraphRAG pipeline.

You will need to create the TextToCypherRetriever retriever that will generate Cypher queries and return results.

The TextToCypherRetriever requires an LLM to generate the Cypher queries:

python
text2cypher_rag.py
# Create Cypher LLM 
t2c_llm = OpenAILLM(
    model_name="gpt-4o", 
    model_params={"temperature": 0}
)

Temperature

The temperature is set to 0. When generating Cypher queries, you want the output to be deterministic and precise.

Create the TextToCypherRetriever retriever:

python
from neo4j_graphrag.retrievers import Text2CypherRetriever

# Build the retriever
retriever = Text2CypherRetriever(
    driver=driver,
    llm=t2c_llm,
)

The retriever will automatically read the graph schema from the database when it is used.

Click to view the complete code
python
import os
from dotenv import load_dotenv
load_dotenv()

from neo4j import GraphDatabase
from neo4j_graphrag.llm import OpenAILLM
from neo4j_graphrag.generation import GraphRAG
from neo4j_graphrag.retrievers import Text2CypherRetriever

# Connect to Neo4j database
driver = GraphDatabase.driver(
    os.getenv("NEO4J_URI"), 
    auth=(
        os.getenv("NEO4J_USERNAME"), 
        os.getenv("NEO4J_PASSWORD")
    )
)

# Create Cypher LLM 
t2c_llm = OpenAILLM(
    model_name="gpt-4o", 
    model_params={"temperature": 0}
)

# Build the retriever
retriever = Text2CypherRetriever(
    driver=driver,
    llm=t2c_llm,
)

llm = OpenAILLM(model_name="gpt-4o")
rag = GraphRAG(retriever=retriever, llm=llm)

query_text = "Which movies did Hugo Weaving acted in?"
query_text = "What are examples of Action movies?"

response = rag.search(
    query_text=query_text,
    return_context=True
    )

print(response.answer)
print("CYPHER :", response.retriever_result.metadata["cypher"])
print("CONTEXT:", response.retriever_result.items)

driver.close()

Run the program, the retriever results and the generated Cypher will be printed when you submit a query.

Experiment with some queries and review the results, for example:

  • Who directed the movie Superman?

  • How many movies are in the Sci-Fi genre?

  • What are examples of Action movies?

There is no guarantee that the generated Cypher query will return results, or be syntactically correct.

In production use cases, you should ensure exception are handled and the generated Cypher queries are validated before execution.

Example Queries

To improve the accuracy of the generated Cypher queries, you can provide examples of queries and an appropriate Cypher query.

If you wanted to find data related to movie ratings, you will could provide an example query that demonstrates how to retrieve ratings from the graph:

USER INPUT: 'Get user ratings for a movie?'
QUERY: MATCH ()-[r:RATED]->(m:Movie)
       WHERE m.title = 'Movie Title'
       RETURN r.rating

Create a list of the example queries:

python
# Cypher examples as input/query pairs
examples = [
    "USER INPUT: 'Get user ratings for a movie?' QUERY: MATCH (u:User)-[r:RATED]->(m:Movie) WHERE m.title = 'Movie Title' RETURN r.rating"
]

Update the TextToCypherRetriever to include the examples:

python
# Build the retriever
retriever = Text2CypherRetriever(
    driver=driver,
    llm=t2c_llm,
    examples=examples,
)
Click to view the complete code
python
import os
from dotenv import load_dotenv
load_dotenv()

from neo4j import GraphDatabase
from neo4j_graphrag.llm import OpenAILLM
from neo4j_graphrag.generation import GraphRAG
from neo4j_graphrag.retrievers import Text2CypherRetriever

# Connect to Neo4j database
driver = GraphDatabase.driver(
    os.getenv("NEO4J_URI"), 
    auth=(
        os.getenv("NEO4J_USERNAME"), 
        os.getenv("NEO4J_PASSWORD")
    )
)

# Create Cypher LLM 
t2c_llm = OpenAILLM(
    model_name="gpt-4o", 
    model_params={"temperature": 0}
)

# Cypher examples as input/query pairs
examples = [
    "USER INPUT: 'Get user ratings for a movie?' QUERY: MATCH (u:User)-[r:RATED]->(m:Movie) WHERE m.title = 'Movie Title' RETURN r.rating"
]

# Build the retriever
retriever = Text2CypherRetriever(
    driver=driver,
    llm=t2c_llm,
    examples=examples,
)

llm = OpenAILLM(model_name="gpt-4o")
rag = GraphRAG(retriever=retriever, llm=llm)

query_text = "What is the highest rating for Goodfellas?"
query_text = "What is the averaging user rating for the movie Toy Story?"
query_text = "What user gives the lowest ratings?"

response = rag.search(
    query_text=query_text,
    return_context=True
    )

print(response.answer)
print("CYPHER :", response.retriever_result.metadata["cypher"])
print("CONTEXT:", response.retriever_result.items)

driver.close()

Run the program and experiment with queries that relate to movie ratings, such as:

  • What is the highest rating for Goodfellas?

  • What is the average user rating for the movie Toy Story?

  • What user gives the lowest ratings?

You can specific multiple examples to help the LLM understand the context and generate more accurate Cypher queries.

Schema

The TextToCypherRetriever will automatically read the whole graph schema from the database.

You can provide a custom schema to the retriever if you want to limit the nodes, relationships and properties that are used to generate Cypher queries. Limiting the scope of the schema can help improve the accuracy of the generated Cypher queries, particularly if the graph contains a lot of nodes and relationships.

Create a textual representation of the graph schema:

python
# Specify your own Neo4j schema
neo4j_schema = """
Node properties:
Person {name: STRING, born: INTEGER}
Movie {tagline: STRING, title: STRING, released: INTEGER}
Genre {name: STRING}
User {name: STRING}

Relationship properties:
ACTED_IN {role: STRING}
RATED {rating: INTEGER}

The relationships:
(:Person)-[:ACTED_IN]->(:Movie)
(:Person)-[:DIRECTED]->(:Movie)
(:User)-[:RATED]->(:Movie)
(:Movie)-[:IN_GENRE]->(:Genre)
"""

Add the schema to the TextToCypherRetriever:

python
# Build the retriever
retriever = Text2CypherRetriever(
    driver=driver,
    llm=t2c_llm,
    neo4j_schema=neo4j_schema,
    examples=examples,
)
Click to view the complete code
python
import os
from dotenv import load_dotenv
load_dotenv()

from neo4j import GraphDatabase
from neo4j_graphrag.llm import OpenAILLM
from neo4j_graphrag.generation import GraphRAG
from neo4j_graphrag.retrievers import Text2CypherRetriever

# Connect to Neo4j database
driver = GraphDatabase.driver(
    os.getenv("NEO4J_URI"), 
    auth=(
        os.getenv("NEO4J_USERNAME"), 
        os.getenv("NEO4J_PASSWORD")
    )
)

# Create Cypher LLM 
t2c_llm = OpenAILLM(
    model_name="gpt-4o", 
    model_params={"temperature": 0}
)

# Specify your own Neo4j schema
neo4j_schema = """
Node properties:
Person {name: STRING, born: INTEGER}
Movie {tagline: STRING, title: STRING, released: INTEGER}
Genre {name: STRING}
User {name: STRING}

Relationship properties:
ACTED_IN {role: STRING}
RATED {rating: INTEGER}

The relationships:
(:Person)-[:ACTED_IN]->(:Movie)
(:Person)-[:DIRECTED]->(:Movie)
(:User)-[:RATED]->(:Movie)
(:Movie)-[:IN_GENRE]->(:Genre)
"""

# Cypher examples as input/query pairs
examples = [
    "USER INPUT: 'Get user ratings for a movie?' QUERY: MATCH (u:User)-[r:RATED]->(m:Movie) WHERE m.title = 'Movie Title' RETURN r.rating"
]

# Build the retriever
retriever = Text2CypherRetriever(
    driver=driver,
    llm=t2c_llm,
    neo4j_schema=neo4j_schema,
    examples=examples,
)

llm = OpenAILLM(model_name="gpt-4o")
rag = GraphRAG(retriever=retriever, llm=llm)

query_text = "Which movies did Hugo Weaving star in?"
query_text = "How many movies are in the Sci-Fi genre?"
query_text = "What is the highest rating for Goodfellas?"
query_text = "What is the averaging user rating for the movie Toy Story?"
query_text = "What year was the movie Babe released?"

response = rag.search(
    query_text=query_text,
    return_context=True
    )

print(response.answer)
print("CYPHER :", response.retriever_result.metadata["cypher"])
print("CONTEXT:", response.retriever_result.items)

driver.close()

Experiment

Experiment by submitting complex queries, observing the generated Cypher queries, and adapting the examples and schema to improve the accuracy of the generated queries.

Continue

When you are ready, continue to the next lesson.

Lesson Summary

In this lesson, you learned how to create a text to Cypher retriever that generates Cypher queries based on natural language input.

In the next lesson, you will explore some of the GenAI frameworks available and their integration with Neo4j.