Now you have all the components needed to retrieve data from Neo4j with Cypher based on a user input. It is time to combine them.
To complete this challenge, you must create a Runnable
instance that:
-
Generates and evaluates a Cypher statement
-
Use the Cypher statement to retrieve data from the database
-
Extract the element IDs and convert the results to a
string
for use in the context prompt -
Generate an answer using the answer generation chain
-
Save the response to the database along with the Cypher statement
-
Return the LLM response
modules/agent/tools/cypher/cypher-retrieval.chain.ts
→
Cypher Generation and Evaluation
To generate and evaluate a new Cypher statement, you’ll need to create a function that generates a Cypher statement.
The modules/agent/tools/cypher/cypher-retrieval.chain.ts
file already has a placeholder function called recursivelyEvaluate()
to perform this task.
/**
* Use database the schema to generate and subsequently validate
* a Cypher statement based on the user question
*
* @param {Neo4jGraph} graph The graph
* @param {BaseLanguageModel} llm An LLM to generate the Cypher
* @param {string} question The rephrased question
* @returns {string}
*/
export async function recursivelyEvaluate(
graph: Neo4jGraph,
llm: BaseLanguageModel,
question: string
): Promise<string> {
// TODO: Create Cypher Generation Chain
// const generationChain = ...
// TODO: Create Cypher Evaluation Chain
// const evaluatorChain = ...
// TODO: Generate Initial cypher
// let cypher = ...
// TODO: Recursively evaluate the cypher until there are no errors
// Bug fix: GPT-4 is adamant that it should use id() regardless of
// the instructions in the prompt. As a quick fix, replace it here
// cypher = cypher.replace(/\sid\(([^)]+)\)/g, " elementId($1)");
// return cypher;
}
In this function, first use the initCypherGenerationChain
function from Cypher Generation Chain lesson and
initCypherEvaluationChain
function from the Cypher Evaluation Chain lesson to create the generation and evaluation chains.
// Initiate chains
const generationChain = await initCypherGenerationChain(graph, llm);
const evaluatorChain = await initCypherEvaluationChain(llm);
Next, invoke the generationChain
to generate an initial Cypher statement.
// Generate Initial Cypher
let cypher = await generationChain.invoke(question);
Now, use a while
loop to recursively evaluate the Cypher statement up to five times until the number of errors the evaluation chain returns is 0.
let errors = ["N/A"];
let tries = 0;
while (tries < 5 && errors.length > 0) {
tries++;
try {
// Evaluate Cypher
const evaluation = await evaluatorChain.invoke({
question,
schema: graph.getSchema(),
cypher,
errors,
});
errors = evaluation.errors;
cypher = evaluation.cypher;
} catch (e: unknown) {}
}
Finally, return the cypher
statement.
// Bug fix: GPT-4 is adamant that it should use id() regardless of
// the instructions in the prompt. As a quick fix, replace it here
cypher = cypher.replace(/\sid\(([^)]+)\)/g, " elementId($1)");
return cypher;
id()
to elementId()
replacement
The first line of this code contains a fix that converts id({variable})
to elementId({variable})
.
No matter what we try in the prompt, the GPT-3.5 Turbo and GPT-4 models use the deprecated id()
method over the elementId()
.
Eventually, the models will recognize that the id() method is deprecated. This problem suggests training a model specifically to generate valid Cypher statements might be necessary.
View full recursivelyEvaluate
function
/**
* Use database the schema to generate and subsequently validate
* a Cypher statement based on the user question
*
* @param {Neo4jGraph} graph The graph
* @param {BaseLanguageModel} llm An LLM to generate the Cypher
* @param {string} question The rephrased question
* @returns {string}
*/
async function recursivelyEvaluate(
graph: Neo4jGraph,
llm: BaseLanguageModel,
question: string
): Promise<string> {
// Initiate chains
const generationChain = await initCypherGenerationChain(graph, llm);
const evaluatorChain = await initCypherEvaluationChain(llm);
// Generate Initial Cypher
let cypher = await generationChain.invoke(question);
let errors = ["N/A"];
let tries = 0;
while (tries < 5 && errors.length > 0) {
tries++;
try {
// Evaluate Cypher
const evaluation = await evaluatorChain.invoke({
question,
schema: graph.getSchema(),
cypher,
errors,
});
errors = evaluation.errors;
cypher = evaluation.cypher;
} catch (e: unknown) {}
}
// Bug fix: GPT-4 is adamant that it should use id() regardless of
// the instructions in the prompt. As a quick fix, replace it here
cypher = cypher.replace(/\sid\(([^)]+)\)/g, " elementId($1)");
return cypher;
}
Handling errors
The LLM will generate a correct Cypher statement most of the time. But, as we’ve found in testing, depending on the instructions provided to the prompt, the loop of Cypher generation and evaluation can be flaky.
You can execute your Cypher statement with an additional evaluation loop to make the application more robust. If the database throws an error, you can analyze the error message using the same evaluation chain and rewrite the statement accordingly.
Find the getResults()
function in modules/agent/tools/cypher/cypher-retrieval.chain.ts
.
/**
* Attempt to get the results, and if there is a syntax error in the Cypher statement,
* attempt to correct the errors.
*
* @param {Neo4jGraph} graph The graph instance to get the results from
* @param {BaseLanguageModel} llm The LLM to evaluate the Cypher statement if anything goes wrong
* @param {string} input The input built up by the Cypher Retrieval Chain
* @returns {Promise<Record<string, any>[]>}
*/
export async function getResults(
graph: Neo4jGraph,
llm: BaseLanguageModel,
input: { question: string; cypher: string }
): Promise<any | undefined> {
// TODO: catch Cypher errors and pass to the Cypher evaluation chain
}
Replace the // TODO
comment with code that will attempt to execute the Cypher statement and retry if the graph.query()
method throws an error.
Start by defining a results
variable and an attempts
variable to hold the maximum number of attempts.
Define a mutable cypher
statement to hold the Cypher statement.
Then, call the initCypherEvaluationChain()
function to create an instance of the evaluation chain.
let results;
let retries = 0;
let cypher = input.cypher;
// Evaluation chain if an error is thrown by Neo4j
const evaluationChain = await initCypherEvaluationChain(llm);
Next, create a while
loop that will iterate a maximum of five times.
Inside use try
/catch
to attempt to execute the Cypher statement.
If an error is thrown, pass the .message
property along with the Cypher statement, question, and schema to the evaluation chain.
Assign the output of the evaluation chain to the cypher
statement.
while (results === undefined && retries < 5) {
try {
results = await graph.query(cypher);
return results;
} catch (e: any) {
retries++;
const evaluation = await evaluationChain.invoke({
cypher,
question: input.question,
schema: graph.getSchema(),
errors: [e.message],
});
cypher = evaluation.cypher;
}
}
return results;
Finally, return the results
.
Building the Chain
This section will take place in the initCypherRetrievalChain()
function.
export default async function initCypherRetrievalChain(
llm: BaseLanguageModel,
graph: Neo4jGraph
) {
// TODO: initiate answer chain
// const answerGeneration = ...
// TODO: return RunnablePassthrough
}
Since an agent will call this chain, it will receive a structured input containing both an input
and a rephrasedQuestion
.
export interface AgentToolInput {
input: string;
rephrasedQuestion: string;
}
Initialize Chains
You must use the Generate Authoritative Answer Chain from the previous lesson to generate an answer.
Use the initGenerateAuthoritativeAnswerChain()
function
const answerGeneration = await initGenerateAuthoritativeAnswerChain(llm);
Generate a Cypher Statement
Now, define the output.
As with the Vector Retrieval tool, you can return a Runnable
using RunnablePassthrough.assign()
.
The first step is to call the recursivelyEvaluate()
function, assigning the output to the cypher
key.
return (
RunnablePassthrough
// Generate and evaluate the Cypher statement
.assign({
cypher: (input: { rephrasedQuestion: string }) =>
recursivelyEvaluate(graph, llm, input.rephrasedQuestion),
})
Get Results
Use the getResults()
function to get the results from the database.
// Get results from database
.assign({
results: (input: { cypher: string; question: string }) =>
getResults(graph, llm, input),
})
Manipulate Results
You will need to extract any element IDs from the results to save the context to the database.
The utils.ts
file exports an extractIds()
function that recursively iterates through the results to find any objects with a key of _id
.
View the extractIds()
function
export function extractIds(input: any): string[] {
let output: string[] = [];
// Function to handle an object
const handleObject = (item: any) => {
for (const key in item) {
if (key === "_id") {
if (!output.includes(item[key])) {
output.push(item[key]);
}
} else if (typeof item[key] === "object" && item[key] !== null) {
// Recurse into the object if it is not null
output = output.concat(extractIds(item[key]));
}
}
};
if (Array.isArray(input)) {
// If the input is an array, iterate over each element
input.forEach((item) => {
if (typeof item === "object" && item !== null) {
handleObject(item);
}
});
} else if (typeof input === "object" && input !== null) {
// If the input is an object, handle it directly
handleObject(input);
}
return output;
}
The result
obtained in the previous step must also be converted to a string.
If there is only one result, use JSON.stringify()
to convert the first object to a JSON string, otherwise return a string representing the entire array.
// Extract information
.assign({
// Extract _id fields
ids: (input: Omit<CypherRetrievalThroughput, "ids">) =>
extractIds(input.results),
// Convert results to JSON output
context: ({ results }: Omit<CypherRetrievalThroughput, "ids">) =>
Array.isArray(results) && results.length == 1
? JSON.stringify(results[0])
: JSON.stringify(results),
})
Generate Output
The input and context can then be passed to the Authoritative Answer Generation chain to generate an answer.
// Generate Output
.assign({
output: (input: CypherRetrievalThroughput) =>
answerGeneration.invoke({
question: input.rephrasedQuestion,
context: input.context,
}),
})
Save response to database
Next, use the saveHistory()
function built in module 3 to save the details of the response to the database.
// Save response to database
.assign({
responseId: async (input: CypherRetrievalThroughput, options) => {
saveHistory(
options?.config.configurable.sessionId,
"cypher",
input.input,
input.rephrasedQuestion,
input.output,
input.ids,
input.cypher
);
},
})
Return the output
Finally, the pick()
function returns the output
key.
// Return the output
.pick("output")
);
Final Function
If you have followed the instructions correctly, your code should resemble the following:
export default async function initCypherRetrievalChain(
llm: BaseLanguageModel,
graph: Neo4jGraph
) {
const answerGeneration = await initGenerateAuthoritativeAnswerChain(llm);
return (
RunnablePassthrough
// Generate and evaluate the Cypher statement
.assign({
cypher: (input: { rephrasedQuestion: string }) =>
recursivelyEvaluate(graph, llm, input.rephrasedQuestion),
})
// Get results from database
.assign({
results: (input: { cypher: string; question: string }) =>
getResults(graph, llm, input),
})
// Extract information
.assign({
// Extract _id fields
ids: (input: Omit<CypherRetrievalThroughput, "ids">) =>
extractIds(input.results),
// Convert results to JSON output
context: ({ results }: Omit<CypherRetrievalThroughput, "ids">) =>
Array.isArray(results) && results.length == 1
? JSON.stringify(results[0])
: JSON.stringify(results),
})
// Generate Output
.assign({
output: (input: CypherRetrievalThroughput) =>
answerGeneration.invoke({
question: input.rephrasedQuestion,
context: input.context,
}),
})
// Save response to database
.assign({
responseId: async (input: CypherRetrievalThroughput, options) => {
saveHistory(
options?.config.configurable.sessionId,
"cypher",
input.input,
input.rephrasedQuestion,
input.output,
input.ids,
input.cypher
);
},
})
// Return the output
.pick("output")
);
}
Testing your changes
If you have followed the instructions, you should be able to run the following unit test to verify the response using the npm run test
command.
npm run test cypher-retrieval.chain.test.ts
View Unit Test
// TODO: Remove code
import { ChatOpenAI } from "@langchain/openai";
import { config } from "dotenv";
import { BaseChatModel } from "langchain/chat_models/base";
import { Runnable } from "@langchain/core/runnables";
import { Neo4jGraph } from "@langchain/community/graphs/neo4j_graph";
import initCypherRetrievalChain, {
recursivelyEvaluate,
getResults,
} from "./cypher-retrieval.chain";
import { close } from "../../../graph";
describe("Cypher QA Chain", () => {
let graph: Neo4jGraph;
let llm: BaseChatModel;
let chain: Runnable;
beforeAll(async () => {
config({ path: ".env.local" });
graph = await Neo4jGraph.initialize({
url: process.env.NEO4J_URI as string,
username: process.env.NEO4J_USERNAME as string,
password: process.env.NEO4J_PASSWORD as string,
database: process.env.NEO4J_DATABASE as string | undefined,
});
llm = new ChatOpenAI({
openAIApiKey: process.env.OPENAI_API_KEY,
modelName: "gpt-3.5-turbo",
temperature: 0,
configuration: {
baseURL: process.env.OPENAI_API_BASE,
},
});
chain = await initCypherRetrievalChain(llm, graph);
});
afterAll(async () => {
await graph.close();
await close();
});
it("should answer a simple question", async () => {
const sessionId = "cypher-retrieval-1";
const res = (await graph.query(
`MATCH (n:Movie) RETURN count(n) AS count`
)) as { count: number }[];
expect(res).toBeDefined();
const output = await chain.invoke(
{
input: "how many are there?",
rephrasedQuestion: "How many Movies are in the database?",
},
{ configurable: { sessionId } }
);
expect(output).toContain(res[0].count);
});
it("should answer a random question", async () => {
const sessionId = "cypher-retrieval-2";
const person = "Emil Eifrem";
const role = "The Chief";
const movie = "Neo4j - Into the Graph";
// Save a fake movie to the database
await graph.query(
`
MERGE (m:Movie {title: $movie})
MERGE (p:Person {name: $person}) SET p:Actor
MERGE (p)-[r:ACTED_IN]->(m)
SET r.role = $role, r.roles = $role
RETURN
m { .title, _id: elementId(m) } AS movie,
p { .name, _id: elementId(p) } AS person
`,
{ movie, person, role }
);
const input = "what did they play?";
const rephrasedQuestion = `What role did ${person} play in ${movie}`;
const output = await chain.invoke(
{
input,
rephrasedQuestion,
},
{ configurable: { sessionId } }
);
expect(output).toContain(role);
// Check persistence
const contextRes = await graph.query(
`
MATCH (s:Session {id: $sessionId})-[:LAST_RESPONSE]->(r)
RETURN
r.input AS input,
r.rephrasedQuestion as rephrasedQuestion,
r.output AS output,
[ (m)-[:CONTEXT]->(c) | elementId(c) ] AS ids
`,
{ sessionId }
);
expect(contextRes).toBeDefined();
if (contextRes) {
const [first] = contextRes;
expect(contextRes.length).toBe(1);
expect(first.input).toEqual(input);
expect(first.rephrasedQuestion).toEqual(rephrasedQuestion);
expect(first.output).toEqual(output);
}
});
it("should use elementId() to return a node ID", async () => {
const sessionId = "cypher-retrieval-3";
const person = "Emil Eifrem";
const role = "The Chief";
const movie = "Neo4j - Into the Graph";
// Save a fake movie to the database
const seed = await graph.query(
`
MERGE (m:Movie {title: $movie})
MERGE (p:Person {name: $person}) SET p:Actor
MERGE (p)-[r:ACTED_IN]->(m)
SET r.role = $role, r.roles = $role
RETURN
m { .title, _id: elementId(m) } AS movie,
p { .name, _id: elementId(p) } AS person
`,
{ movie, person, role }
);
const output = await chain.invoke(
{
input: "what did they play?",
rephrasedQuestion: `What movies has ${person} acted in?`,
},
{ configurable: { sessionId } }
);
expect(output).toContain(person);
expect(output).toContain(movie);
// check context
const contextRes = await graph.query(
`
MATCH (s:Session {id: $sessionId})-[:LAST_RESPONSE]->(r)
RETURN
r.input AS input,
r.rephrasedQuestion as rephrasedQuestion,
r.output AS output,
[ (m)-[:CONTEXT]->(c) | elementId(c) ] AS ids
`,
{ sessionId }
);
expect(contextRes).toBeDefined();
if (contextRes) {
expect(contextRes.length).toBe(1);
const contextIds = contextRes[0].ids.join(",");
const seedIds = seed?.map((el) => el.movie._id);
for (const id in seedIds) {
expect(contextIds).toContain(id);
}
}
});
describe("recursivelyEvaluate", () => {
it("should correct a query with a missing variable", async () => {
const res = await recursivelyEvaluate(
graph,
llm,
"What movies has Emil Eifrem acted in?"
);
expect(res).toBeDefined();
});
});
describe("getResults", () => {
it("should fix a broken Cypher statement on the fly", async () => {
const res = await getResults(graph, llm, {
question: "What role did Emil Eifrem play in Neo4j - Into the Graph?",
cypher:
"MATCH (a:Actor {name: 'Emil Eifrem'})-[:ACTED_IN]->(m:Movie) " +
"RETURN a.name AS Actor, m.title AS Movie, m.tmdbId AS source, " +
"elementId(m) AS _id, m.released AS ReleaseDate, r.role AS Role LIMIT 10",
});
expect(res).toBeDefined();
expect(JSON.stringify(res)).toContain("The Chief");
});
});
});
Randomized responses
LLMs are probabilistic models, meaning they generate different responses with each call.
Given this variability, you might find that not all tests pass whenever testing this function with multiple tests. Therefore, running the test several times may be necessary to achieve consistent results.
Verifying the Test
If every test in the test suite has passed, a new (:Session)
node with a .id
property of cypher-retriever-3
will have been created in your database.
The session should have atleast one (:Response)
node, linked with a :CONTEXT
relationship to a movie with the title Neo4j - Into the Graph
.
Click the Check Database button below to verify the tests have succeeded.
Hint
You can compare your code with the solution in src/solutions/modules/agent/tools/cypher/cypher-retrieval.chain.ts
and double-check that the conditions have been met in the test suite.
Solution
You can compare your code with the solution in src/solutions/modules/agent/tools/cypher/cypher-retrieval.chain.ts
and double-check that the conditions have been met in the test suite.
You can also run the following Cypher statement to double-check that the index has been created in your database.
MATCH (s:Session {id: 'cypher-retrieval-3'})
RETURN s, [
(s)-[:HAS_RESPONSE]->(r) | [r,
[ (r) -[:CONTEXT]->(c) | c ]
]
]
Once you have verified your code and re-ran the tests, click Try again…* to complete the challenge.
Summary
In this lesson, you combined the components built during this module to create a chain that will generate a Cypher statement that answers the user’s question, execute the Cypher statement, and generate a response.
In the next module, you will build an agent that combines this chain with the Vector Retrieval Chain to create an agent that uses an LLM to choose the correct tool to answer the user’s question.