Querying with LLMs

Using an LLM to generate cypher can help you query the knowledge graph, particularly when the schema is created from unstructured data. The LLM uses the knowledge graph schema to dynamically create Cypher queries.

The Using LLMs for Query Generation module in the Neo4j & LLM Fundamentals course covers how to generate Cypher using Python and LangChain.

In this lesson, you will explore options for improving Cypher generation for knowledge graphs.

Cypher Generation

Open the llm-knowledge-graph/query_kg.py file:

View query_kg.py
python
import os
from langchain_openai import ChatOpenAI
from langchain_neo4j import GraphCypherQAChain, Neo4jGraph
from langchain.prompts import PromptTemplate

from dotenv import load_dotenv
load_dotenv()

llm = ChatOpenAI(
    openai_api_key=os.getenv('OPENAI_API_KEY'), 
    temperature=0
)

graph = Neo4jGraph(
    url=os.getenv('NEO4J_URI'),
    username=os.getenv('NEO4J_USERNAME'),
    password=os.getenv('NEO4J_PASSWORD')
)

CYPHER_GENERATION_TEMPLATE = """Task:Generate Cypher statement to query a graph database.
Instructions:
Use only the provided relationship types and properties in the schema.
Do not use any other relationship types or properties that are not provided.
Only include the generated Cypher statement in your response.

Always use case insensitive search when matching strings.

Schema:
{schema}

The question is:
{question}"""

cypher_generation_prompt = PromptTemplate(
    template=CYPHER_GENERATION_TEMPLATE,
    input_variables=["schema", "question"],
)

cypher_chain = GraphCypherQAChain.from_llm(
    llm,
    graph=graph,
    cypher_prompt=cypher_generation_prompt,
    verbose=True,
    allow_dangerous_requests=True
)

def run_cypher(q):
    return cypher_chain.invoke({"query": q})

while True:
    q = input("> ")
    print(run_cypher(q))

This program uses GraphCypherQAChain and a custom prompt to generate Cypher queries.

Allow Dangerous Requests

You are trusting the generation of Cypher to the LLM. It may generate invalid Cypher queries that could corrupt data in the graph or provide access to sensitive information.

You have to opt-in to this risk by setting the allow_dangerous_requests flag to True.

In a production environment, you should ensure that access to data is limited, and sufficient security is in place to prevent malicious queries. This could include the use of a read only user or role based access control.

Run the program and ask a simple question, for example, "How many chunks are in the graph?".

> How many chunks are in the graph?
> Entering new GraphCypherQAChain chain...
Generated Cypher:
MATCH (c:Chunk)
RETURN COUNT(c) as numberOfChunks;
Full Context:
[{'numberOfChunks': 70}]
> Finished chain.

Ask more complex questions to see how the LLM generates Cypher queries.

You will probably find that the queries generated do not return the correct results.

You can improve the results by tuning the prompt and configuring the chain.

Prompt

The name of entities in the knowledge graph are not always in the same case as the question.

You can include an additional instruction and example in the prompt to help the LLM understand generate this Cypher:

python
CYPHER_GENERATION_TEMPLATE = """Task:Generate Cypher statement to query a graph database.
Instructions:
Use only the provided relationship types and properties in the schema.
Do not use any other relationship types or properties that are not provided.
Only include the generated Cypher statement in your response.

Always use case insensitive search when matching strings.

Schema:
{schema}

Examples: 
# Use case insensitive matching for entity ids
MATCH (c:Chunk)-[:HAS_ENTITY]->(e)
WHERE e.id =~ '(?i)entityName'

The question is:
{question}"""

You can also provide specific examples related to navigating the graph structure. For example, how to find documents from the entities extracted.

python
CYPHER_GENERATION_TEMPLATE = """Task:Generate Cypher statement to query a graph database.
Instructions:
Use only the provided relationship types and properties in the schema.
Do not use any other relationship types or properties that are not provided.
Only include the generated Cypher statement in your response.

Always use case insensitive search when matching strings.

Schema:
{schema}

Examples: 
# Use case insensitive matching for entity ids
MATCH (c:Chunk)-[:HAS_ENTITY]->(e)
WHERE e.id =~ '(?i)entityName'

# Find documents that reference entities
MATCH (d:Document)<-[:PART_OF]-(:Chunk)-[:HAS_ENTITY]->(e)
WHERE e.id =~ '(?i)entityName'
RETURN d

The question is:
{question}"""

Providing specific examples improves the Cypher generated by the LLM.

> What documents are about the technology LLM?
> Entering new GraphCypherQAChain chain...
Generated Cypher:
MATCH (d:Document)<-[:PART_OF]-(:Chunk)-[:HAS_ENTITY]->(t:Technology)
WHERE t.id =~ '(?i)LLM'
RETURN d
> what is the most reference technology entity by chunk?
> Entering new GraphCypherQAChain chain...
Generated Cypher:
MATCH (c:Chunk)-[:HAS_ENTITY]->(t:Technology)
RETURN t.id, COUNT(*) AS references
ORDER BY references DESC
LIMIT 1

Ask questions of different complexity to see how the LLM generates Cypher queries. Adapt your prompt to cater for specific scenarios by providing instructions or examples.

Configuration

The GraphCypherQAChain also provides configuration options to improve the Cypher generated.

Exclude types

If there are node labels or relationship types that you wish exclude from your Cypher queries, you can use the exclude_types parameter.

For example, if you are storing conversation history alongside your knowledge graph, you could exclude those nodes and relationships.

The converstion graph structure showing Session
python
cypher_chain = GraphCypherQAChain.from_llm(
    llm,
    graph=graph,
    cypher_prompt=cypher_generation_prompt,
    verbose=True,
    exclude_types=["Session", "Message", "LAST_MESSAGE", "NEXT"],
    allow_dangerous_requests=True
)

Enhanced schema

If the properties within your knowledge graph contain a relatively small range of values, you may benefit from using the enhanced_schema parameter.

When you set the enhanced_schema parameter, the system scans property values and provides examples to the LLM when generating Cypher queries.

This can lead to more accurate Cypher queries, at the cost of more complex prompts, and potentially slower generation times.

python
cypher_chain = GraphCypherQAChain.from_llm(
    llm,
    graph=graph,
    cypher_prompt=cypher_generation_prompt,
    verbose=True,
    enhanced_schema=True,
    allow_dangerous_requests=True
)

LLM Configuration

You can configure the GraphCypherQAChain to use different LLMs for Cypher and question/answer generation.

Using different LLMs can give give improved performance and/or better cost efficiency. Picking the right LLM for the right task can be a trade-off between speed and accuracy.

For example, using 2 different OpenAI LLM models, gpt-4 for cypher generation and gpt-3.5-turbo for question/answer generation.

python
qa_llm = ChatOpenAI(
    openai_api_key=os.getenv('OPENAI_API_KEY'), 
    model="gpt-3.5-turbo",
)

cypher_llm = ChatOpenAI(
    openai_api_key=os.getenv('OPENAI_API_KEY'), 
    model="gpt-4",
    temperature=0
)
cypher_chain = GraphCypherQAChain.from_llm(
    qa_llm=qa_llm,
    cypher_llm=cypher_llm,
    graph=graph,
    cypher_prompt=cypher_generation_prompt,
    verbose=True,
    allow_dangerous_requests=True
)

A temperature of 0 is recommended for Cypher generation.

Experiment by asking questions, adapting the prompt, and configuring the chain to improve the Cypher generated by the LLM.

Check Your Understanding

Excluding data from the Cypher generation

Consider the following scenario.

You have a knowledge graph that contains nodes that you want to ensure is NOT queried by the LLM.

The nodes all have a specific label, Hidden.

How could you guarantee that the Hidden nodes are not included in the Cypher generation?

  • ❏ Provide additional instructions to the LLM in the Cypher generation prompt.

  • ❏ Include an example Cypher query where the Hidden nodes are not returned.

  • ❏ Filter out any questions that include the Hidden nodes before calling the chain.

  • ✓ Add the Hidden nodes to the exclude_types parameter of the GraphCypherQAChain.

Hint

A prompt instruction will not guarantee an LLM will not generate Cypher that includes the Hidden nodes.

Solution

The correct answer is to add the Hidden nodes to the exclude_types parameter of the GraphCypherQAChain.

This will ensure that the Hidden nodes are not included in the Cypher generation.

Lesson Summary

In this lesson, you learned how to use the GraphCypherQAChain to generate Cypher queries from a knowledge graph schema.

In the next lesson, you will create a retriever to query the knowledge graph using unstructured data.