Language models and vector indexes are good at querying unstructured data. Although, as you have seen, responses are not always correct, and when data is structured, it is often easier to query it directly.
LLMs are good at writing Cypher queries when given good information, such as:
-
The schema of the graph
-
Context about the question to be answered
-
Examples of questions and appropriate Cypher queries
In this lesson, you will learn how to use a language model to generate Cypher queries to query a Neo4j graph database.
Generating Cypher
Langchain includes the GraphCypherQAChain
chain that can interact with a Neo4j graph database. It uses a language model to generate Cypher queries and then uses the graph to answer the question.
GraphCypherQAChain
chain requires the following:
-
An LLM (
llm
) for generating Cypher queries -
A graph database connection (
graph
) for answering the queries -
A prompt template (
cypher_prompt
) to give the LLM the schema and question -
An appropriate question which relates to the schema and data in the graph
The program below will generate a Cypher query based on the schema in the graph database and the question.
Review the code and predict what will happen when you run it.
import os
from langchain_openai import ChatOpenAI
from langchain_neo4j import GraphCypherQAChain, Neo4jGraph
from langchain.prompts import PromptTemplate
llm = ChatOpenAI(
openai_api_key=os.getenv("OPENAI_API_KEY")
)
graph = Neo4jGraph(
url=os.getenv("NEO4J_URI"),
username=os.getenv("NEO4J_USERNAME"),
password=os.getenv("NEO4J_PASSWORD")
)
CYPHER_GENERATION_TEMPLATE = """
You are an expert Neo4j Developer translating user questions into Cypher to answer questions about movies and provide recommendations.
Convert the user's question based on the schema.
Schema: {schema}
Question: {question}
"""
cypher_generation_prompt = PromptTemplate(
template=CYPHER_GENERATION_TEMPLATE,
input_variables=["schema", "question"],
)
cypher_chain = GraphCypherQAChain.from_llm(
llm,
graph=graph,
cypher_prompt=cypher_generation_prompt,
verbose=True,
allow_dangerous_requests=True
)
result = cypher_chain.invoke({"query": "What is the plot of the movie Toy Story?"})
print(result)
openai_api_key
and the graph
connection details.Click to reveal your Sandbox connection details
- Connection URL
-
bolt://{sandbox-ip}:{sandbox-boltPort}
- Username
-
{sandbox-username}
- Password
-
{sandbox-password}
When you run the program, you should see the Cypher generated from the question and the data it returned. Something similar to:
Generated Cypher: MATCH (m:Movie {title: "Toy Story"}) RETURN m.plot Full Context: [{'m.plot': "A cowboy doll is profoundly threatened and jealous when a new spaceman figure supplants him as top toy in a boy's room."}]
The LLM used the database schema to generate an appropriate Cypher query. Langchain then executed the query against the graph database, and the result returned.
Breaking Down the Program
Reviewing the program, you should identify the following key points:
-
The program instantiates the required
llm
andgraph
objects using the appropriate API and connection details.pythonllm = ChatOpenAI( openai_api_key=os.getenv("OPENAI_API_KEY") ) graph = Neo4jGraph( url=os.getenv("NEO4J_URI"), username=os.getenv("NEO4J_USERNAME"), password=os.getenv("NEO4J_PASSWORD") )
-
The
CYPHER_GENERATION_TEMPLATE
gives the LLM context. The schema and question are passed to the LLM as input variables.pythonCYPHER_GENERATION_TEMPLATE = """ You are an expert Neo4j Developer translating user questions into Cypher to answer questions about movies and provide recommendations. Convert the user's question based on the schema. Schema: {schema} Question: {question} """ cypher_generation_prompt = PromptTemplate( template=CYPHER_GENERATION_TEMPLATE, input_variables=["schema", "question"], )
The
schema
will be automatically generated from the graph database and passed to the LLM. Thequestion
will be the user’s question. -
The program instantiates the
GraphCypherQAChain
chain with thellm
,graph
, and prompt template (cypher_prompt
).pythoncypher_chain = GraphCypherQAChain.from_llm( llm, graph=graph, cypher_prompt=cypher_generation_prompt, verbose=True, allow_dangerous_requests=True )
The program sets the
verbose
flag toTrue
so you can see the generated Cypher query and response.Allow Dangerous Requests
You are trusting the generation of Cypher to the LLM. It may generate invalid Cypher queries that could corrupt data in the graph or provide access to sensitive information.
You have to opt-in to this risk by setting the
allow_dangerous_requests
flag toTrue
.In a production environment, you should ensure that access to data is limited, and sufficient security is in place to prevent malicious queries. This could include the use of a read only user or role based access control.
-
The chain runs, passing an appropriate question.
pythonresult = cypher_chain.invoke({"query": "What is the plot of the movie Toy Story?"}) print(result)
Experiment with different questions and observe the results.
For example, try:
-
A different context - "What movies did Meg Ryan act in?"
-
An aggregate query - "How many movies has Tom Hanks directed?"
Inconsistent Results
Investigate what happens when you ask the same question multiple times. Observe the generated Cypher query and the response.
"What role did Tom Hanks play in Toy Story?"
You will likely see different results each time you run the program.
MATCH (actor:Actor {name: 'Tom Hanks'})-[:ACTED_IN]->(movie:Movie {title: 'Toy Story'}) RETURN actor.name, movie.title, movie.year, movie.runtime, movie.plot
MATCH (a:Actor {name: 'Tom Hanks'})-[:ACTED_IN]->(m:Movie {title: 'Toy Story'})-[:ACTED_IN]->(p:Person) RETURN p.name AS role
The LLM doesn’t return consistent results - its objective is to produce an answer, not the same response. The response may not be correct or even generate an error due to invalid Cypher.
In the following two lessons, you will learn how to provide additional context and instructions to the LLM to generate better and more consistent results.
Check Your Understanding
GraphCypherQAChain
What four things does the GraphCypherQAChain
chain require to generate a Cypher query?
-
✓ An LLM
-
✓ A graph database
-
❏ An example query
-
✓ A prompt
-
❏ A retriever
-
❏ A tool
-
❏ A vector store
-
✓ A question
Hint
For the LLM to create a Cypher query it needs a graph database schema and a question from the user.
Solution
GraphCypherQAChain
chain requires the following:
-
An LLM for generating Cypher queries
-
A graph database connection for answering the queries
-
A prompt template to give the LLM the schema and question
-
An appropriate question which relates to the schema and data in the graph
Summary
In this lesson, you learned how to use a language model to generate Cypher queries.
In the next lesson, you will experiment with different prompts to improve the results.