The Cypher QA Chain

Language models and vector indexes are good at querying unstructured data. Although, as you have seen, responses are not always correct, and when data is structured, it is often easier to query it directly.

LLMs are good at writing Cypher queries when given good information, such as:

  • The schema of the graph

  • Context about the question to be answered

  • Examples of questions and appropriate Cypher queries

In this lesson, you will learn how to use a language model to generate Cypher queries to query a Neo4j graph database.

Generating Cypher

Langchain includes the GraphCypherQAChainchain that can interact with a Neo4j graph database. It uses a language model to generate Cypher queries and then uses the graph to answer the question.

GraphCypherQAChain chain requires the following:

  • An LLM (llm) for generating Cypher queries

  • A graph database connection (graph) for answering the queries

  • A prompt template (cypher_prompt) to give the LLM the schema and question

  • An appropriate question which relates to the schema and data in the graph

The program below will generate a Cypher query based on the schema in the graph database and the question.

Review the code and predict what will happen when you run it.

python
from langchain_openai import ChatOpenAI
from langchain_neo4j import GraphCypherQAChain, Neo4jGraph
from langchain.prompts import PromptTemplate

llm = ChatOpenAI(
    openai_api_key="sk-..."
)

graph = Neo4jGraph(
    url="bolt://localhost:7687",
    username="neo4j",
    password="pleaseletmein",
)

CYPHER_GENERATION_TEMPLATE = """
You are an expert Neo4j Developer translating user questions into Cypher to answer questions about movies and provide recommendations.
Convert the user's question based on the schema.

Schema: {schema}
Question: {question}
"""

cypher_generation_prompt = PromptTemplate(
    template=CYPHER_GENERATION_TEMPLATE,
    input_variables=["schema", "question"],
)

cypher_chain = GraphCypherQAChain.from_llm(
    llm,
    graph=graph,
    cypher_prompt=cypher_generation_prompt,
    verbose=True,
    allow_dangerous_requests=True
)

cypher_chain.invoke({"query": "What is the plot of the movie Toy Story?"})

Before running the program, you must update the openai_api_key and the graph connection details.
Click to reveal your Sandbox connection details
Connection URL

bolt://{sandbox-ip}:{sandbox-boltPort}

Username

{sandbox-username}

Password

{sandbox-password}

When you run the program, you should see the Cypher generated from the question and the data it returned. Something similar to:

Generated Cypher:
MATCH (m:Movie {title: "Toy Story"})
RETURN m.plot
Full Context:
[{'m.plot': "A cowboy doll is profoundly threatened and jealous when a new spaceman
figure supplants him as top toy in a boy's room."}]

The LLM used the database schema to generate an appropriate Cypher query. Langchain then executed the query against the graph database, and the result returned.

Breaking Down the Program

Reviewing the program, you should identify the following key points:

  1. The program instantiates the required llm and graph objects using the appropriate API and connection details.

    python
    llm = ChatOpenAI(
        openai_api_key="sk-..."
    )
    
    graph = Neo4jGraph(
        url="bolt://localhost:7687",
        username="neo4j",
        password="pleaseletmein",
    )
  2. The CYPHER_GENERATION_TEMPLATE gives the LLM context. The schema and question are passed to the LLM as input variables.

    python
    CYPHER_GENERATION_TEMPLATE = """
    You are an expert Neo4j Developer translating user questions into Cypher to answer questions about movies and provide recommendations.
    Convert the user's question based on the schema.
    
    Schema: {schema}
    Question: {question}
    """
    
    cypher_generation_prompt = PromptTemplate(
        template=CYPHER_GENERATION_TEMPLATE,
        input_variables=["schema", "question"],
    )

    The schema will be automatically generated from the graph database and passed to the LLM. The question will be the user’s question.

  3. The program instantiates the GraphCypherQAChain chain with the llm, graph, and prompt template (cypher_prompt).

    python
    cypher_chain = GraphCypherQAChain.from_llm(
        llm,
        graph=graph,
        cypher_prompt=cypher_generation_prompt,
        verbose=True,
        allow_dangerous_requests=True
    )

    The program sets the verbose flag to True so you can see the generated Cypher query and response.

    Allow Dangerous Requests

    You are trusting the generation of Cypher to the LLM. It may generate invalid Cypher queries that could corrupt data in the graph or provide access to sensitive information.

    You have to opt-in to this risk by setting the allow_dangerous_requests flag to True.

    In a production environment, you should ensure that access to data is limited, and sufficient security is in place to prevent malicious queries. This could include the use of a read only user or role based access control.

  4. The chain runs, passing an appropriate question.

    python
    cypher_chain.invoke({"query": "What is the plot of the movie Toy Story?"})

Experiment with different questions and observe the results.

For example, try:

  1. A different context - "What movies did Meg Ryan act in?"

  2. An aggregate query - "How many movies has Tom Hanks directed?"

Inconsistent Results

Investigate what happens when you ask the same question multiple times. Observe the generated Cypher query and the response.

"What role did Tom Hanks play in Toy Story?"

You will likely see different results each time you run the program.

MATCH (actor:Actor {name: 'Tom Hanks'})-[:ACTED_IN]->(movie:Movie {title: 'Toy Story'})
RETURN actor.name, movie.title, movie.year, movie.runtime, movie.plot
MATCH (a:Actor {name: 'Tom Hanks'})-[:ACTED_IN]->(m:Movie {title: 'Toy Story'})-[:ACTED_IN]->(p:Person)
RETURN p.name AS role

The LLM doesn’t return consistent results - its objective is to produce an answer, not the same response. The response may not be correct or even generate an error due to invalid Cypher.

In the following two lessons, you will learn how to provide additional context and instructions to the LLM to generate better and more consistent results.

Check Your Understanding

GraphCypherQAChain

What four things does the GraphCypherQAChain chain require to generate a Cypher query?

  • ✓ An LLM

  • ✓ A graph database

  • ❏ An example query

  • ✓ A prompt

  • ❏ A retriever

  • ❏ A tool

  • ❏ A vector store

  • ✓ A question

Hint

For the LLM to create a Cypher query it needs a graph database schema and a question from the user.

Solution

GraphCypherQAChain chain requires the following:

  • An LLM for generating Cypher queries

  • A graph database connection for answering the queries

  • A prompt template to give the LLM the schema and question

  • An appropriate question which relates to the schema and data in the graph

Summary

In this lesson, you learned how to use a language model to generate Cypher queries.

In the next lesson, you will experiment with different prompts to improve the results.