Using an LLM to generate cypher can help you query the knowledge graph, particularly when the schema is created from unstructured data. The LLM uses the knowledge graph schema to dynamically create Cypher queries.
The Using LLMs for Query Generation module in the Neo4j & LLM Fundamentals course covers how to generate Cypher using Python and LangChain.
In this lesson, you will explore options for improving Cypher generation for knowledge graphs.
Cypher Generation
Open the llm-knowledge-graph/query_kg.py
file:
View query_kg.py
import os
from langchain_openai import ChatOpenAI
from langchain_neo4j import GraphCypherQAChain, Neo4jGraph
from langchain.prompts import PromptTemplate
from dotenv import load_dotenv
load_dotenv()
llm = ChatOpenAI(
openai_api_key=os.getenv('OPENAI_API_KEY'),
temperature=0
)
graph = Neo4jGraph(
url=os.getenv('NEO4J_URI'),
username=os.getenv('NEO4J_USERNAME'),
password=os.getenv('NEO4J_PASSWORD')
)
CYPHER_GENERATION_TEMPLATE = """Task:Generate Cypher statement to query a graph database.
Instructions:
Use only the provided relationship types and properties in the schema.
Do not use any other relationship types or properties that are not provided.
Only include the generated Cypher statement in your response.
Always use case insensitive search when matching strings.
Schema:
{schema}
The question is:
{question}"""
cypher_generation_prompt = PromptTemplate(
template=CYPHER_GENERATION_TEMPLATE,
input_variables=["schema", "question"],
)
cypher_chain = GraphCypherQAChain.from_llm(
llm,
graph=graph,
cypher_prompt=cypher_generation_prompt,
verbose=True,
allow_dangerous_requests=True
)
def run_cypher(q):
return cypher_chain.invoke({"query": q})
while (q := input("> ")) != "exit":
print(run_cypher(q))
This program uses GraphCypherQAChain and a custom prompt to generate Cypher queries.
Allow Dangerous Requests
You are trusting the generation of Cypher to the LLM. It may generate invalid Cypher queries that could corrupt data in the graph or provide access to sensitive information.
You have to opt-in to this risk by setting the allow_dangerous_requests
flag to True
.
In a production environment, you should ensure that access to data is limited, and sufficient security is in place to prevent malicious queries. This could include the use of a read only user or role based access control.
Run the program and ask a simple question, for example, "How many chunks are in the graph?".
> How many chunks are in the graph? > Entering new GraphCypherQAChain chain... Generated Cypher: MATCH (c:Chunk) RETURN COUNT(c) as numberOfChunks; Full Context: [{'numberOfChunks': 70}] > Finished chain.
Ask more complex questions to see how the LLM generates Cypher queries.
You will probably find that the queries generated do not return the correct results.
You can improve the results by tuning the prompt and configuring the chain.
Prompt
The name of entities in the knowledge graph are not always in the same case as the question.
You can include an additional instruction and example in the prompt to help the LLM understand generate this Cypher:
CYPHER_GENERATION_TEMPLATE = """Task:Generate Cypher statement to query a graph database.
Instructions:
Use only the provided relationship types and properties in the schema.
Do not use any other relationship types or properties that are not provided.
Only include the generated Cypher statement in your response.
Always use case insensitive search when matching strings.
Schema:
{schema}
Examples:
# Use case insensitive matching for entity ids
MATCH (c:Chunk)-[:HAS_ENTITY]->(e)
WHERE e.id =~ '(?i)entityName'
The question is:
{question}"""
You can also provide specific examples related to navigating the graph structure. For example, how to find documents from the entities extracted.
CYPHER_GENERATION_TEMPLATE = """Task:Generate Cypher statement to query a graph database.
Instructions:
Use only the provided relationship types and properties in the schema.
Do not use any other relationship types or properties that are not provided.
Only include the generated Cypher statement in your response.
Always use case insensitive search when matching strings.
Schema:
{schema}
Examples:
# Use case insensitive matching for entity ids
MATCH (c:Chunk)-[:HAS_ENTITY]->(e)
WHERE e.id =~ '(?i)entityName'
# Find documents that reference entities
MATCH (d:Document)<-[:PART_OF]-(:Chunk)-[:HAS_ENTITY]->(e)
WHERE e.id =~ '(?i)entityName'
RETURN d
The question is:
{question}"""
Providing specific examples improves the Cypher generated by the LLM.
> What documents are about the technology LLM? > Entering new GraphCypherQAChain chain... Generated Cypher: MATCH (d:Document)<-[:PART_OF]-(:Chunk)-[:HAS_ENTITY]->(t:Technology) WHERE t.id =~ '(?i)LLM' RETURN d
> what is the most reference technology entity by chunk? > Entering new GraphCypherQAChain chain... Generated Cypher: MATCH (c:Chunk)-[:HAS_ENTITY]->(t:Technology) RETURN t.id, COUNT(*) AS references ORDER BY references DESC LIMIT 1
Ask questions of different complexity to see how the LLM generates Cypher queries. Adapt your prompt to cater for specific scenarios by providing instructions or examples.
Configuration
The GraphCypherQAChain also provides configuration options to improve the Cypher generated.
Exclude types
If there are node labels or relationship types that you wish exclude from your Cypher queries, you can use the exclude_types
parameter.
For example, if you are storing conversation history alongside your knowledge graph, you could exclude those nodes and relationships.
cypher_chain = GraphCypherQAChain.from_llm(
llm,
graph=graph,
cypher_prompt=cypher_generation_prompt,
verbose=True,
exclude_types=["Session", "Message", "LAST_MESSAGE", "NEXT"],
allow_dangerous_requests=True
)
Enhanced schema
If the properties within your knowledge graph contain a relatively small range of values, you may benefit from using the enhanced_schema
parameter.
When you set the enhanced_schema
parameter, the system scans property values and provides examples to the LLM when generating Cypher queries.
This can lead to more accurate Cypher queries, at the cost of more complex prompts, and potentially slower generation times.
cypher_chain = GraphCypherQAChain.from_llm(
llm,
graph=graph,
cypher_prompt=cypher_generation_prompt,
verbose=True,
enhanced_schema=True,
allow_dangerous_requests=True
)
LLM Configuration
You can configure the GraphCypherQAChain to use different LLMs for Cypher and question/answer generation.
Using different LLMs can give give improved performance and/or better cost efficiency. Picking the right LLM for the right task can be a trade-off between speed and accuracy.
For example, using 2 different OpenAI LLM models, gpt-4
for cypher generation and gpt-3.5-turbo
for question/answer generation.
qa_llm = ChatOpenAI(
openai_api_key=os.getenv('OPENAI_API_KEY'),
model="gpt-3.5-turbo",
)
cypher_llm = ChatOpenAI(
openai_api_key=os.getenv('OPENAI_API_KEY'),
model="gpt-4",
temperature=0
)
cypher_chain = GraphCypherQAChain.from_llm(
qa_llm=qa_llm,
cypher_llm=cypher_llm,
graph=graph,
cypher_prompt=cypher_generation_prompt,
verbose=True,
allow_dangerous_requests=True
)
0
is recommended for Cypher generation.Experiment by asking questions, adapting the prompt, and configuring the chain to improve the Cypher generated by the LLM.
Check Your Understanding
Excluding data from the Cypher generation
Consider the following scenario.
You have a knowledge graph that contains nodes that you want to ensure is NOT queried by the LLM.
The nodes all have a specific label, Hidden
.
How could you guarantee that the Hidden
nodes are not included in the Cypher generation?
-
❏ Provide additional instructions to the LLM in the Cypher generation prompt.
-
❏ Include an example Cypher query where the
Hidden
nodes are not returned. -
❏ Filter out any questions that include the
Hidden
nodes before calling the chain. -
✓ Add the
Hidden
nodes to theexclude_types
parameter of theGraphCypherQAChain
.
Hint
A prompt instruction will not guarantee an LLM will not generate Cypher that includes the Hidden
nodes.
Solution
The correct answer is to add the Hidden
nodes to the exclude_types
parameter of the GraphCypherQAChain
.
This will ensure that the Hidden
nodes are not included in the Cypher generation.
Lesson Summary
In this lesson, you learned how to use the GraphCypherQAChain
to generate Cypher queries from a knowledge graph schema.
In the next lesson, you will create a retriever to query the knowledge graph using unstructured data.