Cypher Generation

LLMs are good at writing Cypher queries when given good information, such as:

  • The schema of the graph

  • Context about the question

  • Examples of questions and appropriate Cypher queries

You will learn how to use a language model to generate Cypher queries to query a Neo4j graph database.

Generating Cypher

LangChain includes the GraphCypherQAChainchain that can interact with a Neo4j graph database. It uses a language model to generate Cypher queries and then uses the graph to answer the question.

Open the 2-llm-rag-python-langchain/cypher_chain.py file and review the code.

python
import os
from dotenv import load_dotenv
load_dotenv()

from langchain_openai import ChatOpenAI
from langchain_community.graphs import Neo4jGraph
from langchain.chains import GraphCypherQAChain
from langchain.prompts import PromptTemplate

llm = ChatOpenAI(openai_api_key=os.getenv('OPENAI_API_KEY'))

graph = Neo4jGraph(
    url=os.getenv('NEO4J_URI'),
    username=os.getenv('NEO4J_USERNAME'),
    password=os.getenv('NEO4J_PASSWORD'),
)

CYPHER_GENERATION_TEMPLATE = """
You are an expert Neo4j Developer translating user questions into Cypher to answer questions about movies and provide recommendations.
Convert the user's question based on the schema.

Schema: {schema}
Question: {question}
"""

cypher_generation_prompt = PromptTemplate(
    template=CYPHER_GENERATION_TEMPLATE,
    input_variables=["schema", "question"],
)

cypher_chain = GraphCypherQAChain.from_llm(
    llm,
    graph=graph,
    cypher_prompt=cypher_generation_prompt,
    verbose=True
)

cypher_chain.invoke({"query": "What is the plot of the movie Toy Story?"})

The program will generate a Cypher query based on the question and the graph database schema.

When you run the program, you should see the Cypher generated from the question and the data it returned. Something similar to:

Generated Cypher:
MATCH (m:Movie {title: "Toy Story"})
RETURN m.plot
Full Context:
[{'m.plot': "A cowboy doll is profoundly threatened and jealous when a new spaceman
figure supplants him as top toy in a boy's room."}]

The LLM used the database schema to generate an appropriate Cypher query. Langchain then executed the query against the graph database, and the result returned.

Inconsistent Results

Investigate what happens when you ask the same question multiple times. Observe the generated Cypher query and the response.

"What role did Tom Hanks play in Toy Story?"

You will likely see different results each time you run the program.

MATCH (actor:Actor {name: 'Tom Hanks'})-[:ACTED_IN]->(movie:Movie {title: 'Toy Story'})
RETURN actor.name, movie.title, movie.year, movie.runtime, movie.plot
MATCH (a:Actor {name: 'Tom Hanks'})-[:ACTED_IN]->(m:Movie {title: 'Toy Story'})-[:ACTED_IN]->(p:Person)
RETURN p.name AS role

The LLM doesn’t return consistent results - its objective is to produce an answer, not the same response, and they may not be correct. The response may not be correct or even generate an error due to invalid Cypher.

You can improve the responses by providing additional context and instructions to the LLM

Provide specific instructions

The LLM’s training data included many Cypher statements, but these statements were not specific to the structure of your graph database.

You can provide specific instructions to the LLM to state that the generated Cypher statements should follow the schema.

python
CYPHER_GENERATION_TEMPLATE = """
You are an expert Neo4j Developer translating user questions into Cypher to answer questions about movies and provide recommendations.
Convert the user's question based on the schema.

Instructions:
Use only the provided relationship types and properties in the schema.
Do not use any other relationship types or properties that are not provided.

Schema: {schema}
Question: {question}
"""

The LLM may also need additional instructions about the data. For movie titles that begin with "The", move "the" to the end.

python
CYPHER_GENERATION_TEMPLATE = """
You are an expert Neo4j Developer translating user questions into Cypher to answer questions about movies and provide recommendations.
Convert the user's question based on the schema.

Instructions:
Use only the provided relationship types and properties in the schema.
Do not use any other relationship types or properties that are not provided.
For movie titles that begin with "The", move "the" to the end, For example "The 39 Steps" becomes "39 Steps, The" or "The Matrix" becomes "Matrix, The".

Schema: {schema}
Question: {question}
"""

You can also instruct the LLM on how to respond. For example, only when it requires a Cypher statement and when it returns data.

python
CYPHER_GENERATION_TEMPLATE = """
You are an expert Neo4j Developer translating user questions into Cypher to answer questions about movies and provide recommendations.
Convert the user's question based on the schema.

Instructions:
Use only the provided relationship types and properties in the schema.
Do not use any other relationship types or properties that are not provided.
For movie titles that begin with "The", move "the" to the end, For example "The 39 Steps" becomes "39 Steps, The" or "The Matrix" becomes "Matrix, The".

If no data is returned, do not attempt to answer the question.
Only respond to questions that require you to construct a Cypher statement.
Do not include any explanations or apologies in your responses.

Schema: {schema}
Question: {question}
"""

Examples

Even with specific instructions, the LLM can still make mistakes. You can provide examples of questions and appropriate Cypher. This technique is known as Few-Shot Prompting.

For example, you could provide an example of how to find movies and genres:

python
CYPHER_GENERATION_TEMPLATE = """
You are an expert Neo4j Developer translating user questions into Cypher to answer questions about movies and provide recommendations.
Convert the user's question based on the schema.

Instructions:
Use only the provided relationship types and properties in the schema.
Do not use any other relationship types or properties that are not provided.
For movie titles that begin with "The", move "the" to the end, For example "The 39 Steps" becomes "39 Steps, The" or "The Matrix" becomes "Matrix, The".

If no data is returned, do not attempt to answer the question.
Only respond to questions that require you to construct a Cypher statement.
Do not include any explanations or apologies in your responses.

Examples:

Find movies and genres:
MATCH (m:Movie)-[:IN_GENRE]->(g)
RETURN m.title, g.name

Schema: {schema}
Question: {question}
"""

Experiment with different instructions and examples to see how you can improve the response of the Cypher generation.

Continue

When you are ready, you can move on to the next task.

Lesson Summary

You learned how to use a language model to generate Cypher queries.

Next, you can apply your knowledge to create an agent.