The first step in instructing an LLM to retrieve data from a Neo4j database is to generate a Cypher statement.
To complete this challenge, you must modify the initCypherGenerationChain()
function in modules/agent/tools/cypher/cypher-generation.chain.ts
to return a chain that:
-
Accepts the rephrased question as a
string
-
Format a prompt that instructs the LLM to use the schema provided to generate a Cypher statement to retrieve the data that answers the question
-
Pass the formatted prompt to an LLM
-
Parse the output as a string
cypher-generation.chain.ts
→
Prompt Template
In the initCypherGenerationChain()
function, use the PromptTemplate.fromTemplate()
method to create a new prompt template with the following prompt.
You are a Neo4j Developer translating user questions into Cypher to answer questions
about movies and provide recommendations.
Convert the user's question into a Cypher statement based on the schema.
You must:
* Only use the nodes, relationships and properties mentioned in the schema.
* When required, `IS NOT NULL` to check for property existence, and not the exists() function.
* Use the `elementId()` function to return the unique identifier for a node or relationship as `_id`.
For example:
```
MATCH (a:Person)-[:ACTED_IN]->(m:Movie)
WHERE a.name = 'Emil Eifrem'
RETURN m.title AS title, elementId(m) AS _id, a.role AS role
```
* Include extra information about the nodes that may help an LLM provide a more informative answer,
for example the release date, rating or budget.
* For movies, use the tmdbId property to return a source URL.
For example: `'https://www.themoviedb.org/movie/'+ m.tmdbId AS source`.
* For movie titles that begin with "The", move "the" to the end.
For example "The 39 Steps" becomes "39 Steps, The" or "the matrix" becomes "Matrix, The".
* Limit the maximum number of results to 10.
* Respond with only a Cypher statement. No preamble.
Example Question: What role did Tom Hanks play in Toy Story?
Example Cypher:
MATCH (a:Actor {{name: 'Tom Hanks'}})-[rel:ACTED_IN]->(m:Movie {{title: 'Toy Story'}})
RETURN a.name AS Actor, m.title AS Movie, elementId(m) AS _id, rel.role AS RoleInMovie
Schema:
{schema}
Question:
{question}
Remember to use backslashes (\
) to escape the back-ticks if you are using template strings.
Specific Instructions
This prompt includes specific instructions that the LLM should follow when writing the Cypher statement.
This technique is known as in-context learning, where an LLM uses instructions in the prompt to adapt its responses to new tasks or questions without needing prior training on specific tasks.
You can learn more in the Providing Specific Instructions lesson in Neo4j & LLM Fundamentals.
Your code should resemble the following:
// Create Prompt Template
const cypherPrompt = PromptTemplate.fromTemplate(`
You are a Neo4j Developer translating user questions into Cypher to answer questions
about movies and provide recommendations.
Convert the user's question into a Cypher statement based on the schema.
You must:
* Only use the nodes, relationships and properties mentioned in the schema.
* When required, \`IS NOT NULL\` to check for property existence, and not the exists() function.
* Use the \`elementId()\` function to return the unique identifier for a node or relationship as \`_id\`.
For example:
\`\`\`
MATCH (a:Person)-[:ACTED_IN]->(m:Movie)
WHERE a.name = 'Emil Eifrem'
RETURN m.title AS title, elementId(m) AS _id, a.role AS role
\`\`\`
* Include extra information about the nodes that may help an LLM provide a more informative answer,
for example the release date, rating or budget.
* For movies, use the tmdbId property to return a source URL.
For example: \`'https://www.themoviedb.org/movie/'+ m.tmdbId AS source\`.
* For movie titles that begin with "The", move "the" to the end.
For example "The 39 Steps" becomes "39 Steps, The" or "the matrix" becomes "Matrix, The".
* Limit the maximum number of results to 10.
* Respond with only a Cypher statement. No preamble.
Example Question: What role did Tom Hanks play in Toy Story?
Example Cypher:
MATCH (a:Actor {{name: 'Tom Hanks'}})-[rel:ACTED_IN]->(m:Movie {{title: 'Toy Story'}})
RETURN a.name AS Actor, m.title AS Movie, elementId(m) AS _id, rel.role AS RoleInMovie
Schema:
{schema}
Question:
{question}
`);
Returning Element IDs
You may have noticed the instruction to use the elementId()
function to return the Element ID of any nodes returned.
You will use this value to create :CONTEXT
relationships in the database.
Return a Runnable Sequence
Use the RunnableSequence.from()
method to create a new chain.
The chain should pass the prompt to the LLM passed as a parameter, then format the response as a string using a new instance of the StringOutputParser
.
// Create the runnable sequence
return RunnableSequence.from<string, string>([
// ...
]);
Initial Inputs
Inside the array, add an object that sets the question
and schema
for the chain.
To assign the original input string to the question
key, create a new RunnablePassthrough
instance.
Use the graph.getSchema()
to assign a copy of the database schema to the schema
key.
{
// Take the input and assign it to the question key
question: new RunnablePassthrough(),
// Get the schema
schema: () => graph.getSchema(),
},
Format Prompt and Process
Now that the prompt inputs are ready, these can be replaced in the prompt, passed to the LLM, and the output parsed as a string.
// Create the runnable sequence
return RunnableSequence.from<string, string>([
{
// Take the input and assign it to the question key
question: new RunnablePassthrough(),
// Get the schema
schema: () => graph.getSchema(),
},
cypherPrompt,
llm,
new StringOutputParser(),
]);
Finished Sequence
If you have followed the steps correctly, your code should resemble the following:
export default async function initCypherGenerationChain(
graph: Neo4jGraph,
llm: BaseLanguageModel
) {
// Create Prompt Template
const cypherPrompt = PromptTemplate.fromTemplate(`
You are a Neo4j Developer translating user questions into Cypher to answer questions
about movies and provide recommendations.
Convert the user's question into a Cypher statement based on the schema.
You must:
* Only use the nodes, relationships and properties mentioned in the schema.
* When required, \`IS NOT NULL\` to check for property existence, and not the exists() function.
* Use the \`elementId()\` function to return the unique identifier for a node or relationship as \`_id\`.
For example:
\`\`\`
MATCH (a:Person)-[:ACTED_IN]->(m:Movie)
WHERE a.name = 'Emil Eifrem'
RETURN m.title AS title, elementId(m) AS _id, a.role AS role
\`\`\`
* Include extra information about the nodes that may help an LLM provide a more informative answer,
for example the release date, rating or budget.
* For movies, use the tmdbId property to return a source URL.
For example: \`'https://www.themoviedb.org/movie/'+ m.tmdbId AS source\`.
* For movie titles that begin with "The", move "the" to the end.
For example "The 39 Steps" becomes "39 Steps, The" or "the matrix" becomes "Matrix, The".
* Limit the maximum number of results to 10.
* Respond with only a Cypher statement. No preamble.
Example Question: What role did Tom Hanks play in Toy Story?
Example Cypher:
MATCH (a:Actor {{name: 'Tom Hanks'}})-[rel:ACTED_IN]->(m:Movie {{title: 'Toy Story'}})
RETURN a.name AS Actor, m.title AS Movie, elementId(m) AS _id, rel.role AS RoleInMovie
Schema:
{schema}
Question:
{question}
`);
// Create the runnable sequence
return RunnableSequence.from<string, string>([
{
// Take the input and assign it to the question key
question: new RunnablePassthrough(),
// Get the schema
schema: () => graph.getSchema(),
},
cypherPrompt,
llm,
new StringOutputParser(),
]);
}
Testing your changes
If you have followed the instructions, you should be able to run the following unit test to verify the response using the npm run test
command.
npm run test cypher-generation.chain.test.ts
View Unit Test
import { ChatOpenAI } from "@langchain/openai";
import { config } from "dotenv";
import { BaseChatModel } from "langchain/chat_models/base";
import { RunnableSequence } from "@langchain/core/runnables";
import { Neo4jGraph } from "@langchain/community/graphs/neo4j_graph";
import initCypherGenerationChain from "./cypher-generation.chain";
import { extractIds } from "../../../../utils";
import { close } from "../../../graph";
describe("Cypher Generation Chain", () => {
let graph: Neo4jGraph;
let llm: BaseChatModel;
let chain: RunnableSequence<string, string>;
beforeAll(async () => {
config({ path: ".env.local" });
graph = await Neo4jGraph.initialize({
url: process.env.NEO4J_URI as string,
username: process.env.NEO4J_USERNAME as string,
password: process.env.NEO4J_PASSWORD as string,
database: process.env.NEO4J_DATABASE as string | undefined,
});
llm = new ChatOpenAI({
openAIApiKey: process.env.OPENAI_API_KEY,
modelName: "gpt-3.5-turbo",
temperature: 0,
configuration: {
baseURL: process.env.OPENAI_API_BASE,
},
});
chain = await initCypherGenerationChain(graph, llm);
});
afterAll(async () => {
await graph.close();
await close();
});
it("should generate a simple count query", async () => {
const output = await chain.invoke("How many movies are in the database?");
expect(output.toLowerCase()).toContain("match (");
expect(output).toContain(":Movie");
expect(output.toLowerCase()).toContain("return");
expect(output.toLowerCase()).toContain("count(");
});
it("should generate a Cypher statement with a relationship", async () => {
const output = await chain.invoke("Who directed The Matrix?");
expect(output.toLowerCase()).toContain("match (");
expect(output).toContain(":Movie");
expect(output).toContain(":DIRECTED]");
expect(output.toLowerCase()).toContain("return");
expect(output.toLowerCase()).toContain("_id");
});
it("should extract IDs", () => {
const ids = extractIds([
{
_id: "1",
name: "Micheal Ward",
roles: [
{
_id: "2",
name: "Stephen",
movie: { _id: "3", title: "Empire of Light" },
},
{
_id: "4",
name: "Marco",
movie: { _id: "99", title: "Blue Story" },
},
],
},
{ _id: "100" },
]);
expect(ids).toContain("1");
expect(ids).toContain("2");
expect(ids).toContain("3");
expect(ids).toContain("4");
expect(ids).toContain("99");
expect(ids).toContain("100");
});
});
It works!
If all the tests have passed, you will have a chain capable of generating Cypher statements based on a question using the database schema.
Hit the button below to mark the challenge as complete.
Summary
In this lesson, you built a chain that generates a Cypher statement based on user input.
In the next lesson, you will learn how LLMs can be used to evaluate the statement.