Creating a Retrieval Chain

Now that you have a vector store, you can use it to retrieve chunks of text that are semantically similar to a user’s question.

In this challenge, you will create a chain that will use the vector search index to find movies with similar plots.

You must first:

Use the initVectorStore() function implemented in the previous lesson to create a vector store and retriever
Create an instance of the Answer Generation chain.

Then, create a chain that will:

Takes the string input and assigns it the input variable
Uses the input to retrieve similar movie plots
Uses the answer generation chain to generate an answer
Use the saveHistory() function to save the response and context to the database
Returns the output as a string.

Existing function

The modules/agent/tools/vector-retrieval.chain.ts file contains the following placeholder functions for saving and retrieving history.

typescript

export default async function initVectorRetrievalChain(
  llm: BaseLanguageModel,
  embeddings: Embeddings
): Promise<Runnable<AgentToolInput, string>> {
  // TODO: Create vector store instance
  // const vectorStore = ...
  // TODO: Initialize a retriever wrapper around the vector store
  // const vectorStoreRetriever = ...
  // TODO: Initialize Answer chain
  // const answerChain = ...
  // TODO: Return chain
  // return RunnablePassthrough.assign( ... )
}

Open vector-retrieval.chain.ts

Instantiate Tools

Inside the initVectorRetrievalChain() function, replace the // TODO comments to create an instance of the vector store using the initVectorStore() function from the previous lesson.

typescript

Vector Store

//  Create vector store instance
const vectorStore = await initVectorStore(embeddings);

Next, call the .asRetriever() method on the vectorStore object to create a new VectorStoreRetriever instance.

typescript

Vector Store

// Initialize a retriever wrapper around the vector store
const vectorStoreRetriever = vectorStore.asRetriever(5);

Finally, create an answer generation chain using the initGenerateAnswerChain() function.

typescript

Answer Generation Chain

// Initialize  Answer Chain
const answerChain = initGenerateAnswerChain(llm);

Building the Chain

As this chain will be called by an agent, it will receive a structured input containing an input and rephrasedQuestion.

typescript

Agent to Tool Input

export interface AgentToolInput {
  input: string;
  rephrasedQuestion: string;
}

Because the chain will receive an object as the input, you can use RunnablePassthrough.assign() to modify the input directly rather than the RunnableSequence.from() method used in the previous lessons.

This should be used to collect relevant context using the retriever.

typescript

Get Documents

// Get the rephrased question and generate context
return (
  RunnablePassthrough.assign({
    documents: new RunnablePick("rephrasedQuestion").pipe(
      vectorStoreRetriever
    ),
  })

Next, the elementIds of the document must be extracted from the to create the :CONTEXT relationship between the (:Response) and (:Movie) nodes. At the same time, the context needs to be converted to a string so it can be used in the Answer Generation Chain.

Helper Functions

These functions in vector-retrieval.chain.ts are used to extract information to modify the context.

typescript

Helper Functions

// Helper function to extract document IDs from Movie node metadata
const extractDocumentIds = (
  documents: DocumentInterface<{ _id: string; [key: string]: any }>[]
): string[] => documents.map((document) => document.metadata._id);

// Convert documents to string to be included in the prompt
const docsToJson = (documents: DocumentInterface[]) =>
  JSON.stringify(documents);

The RunnablePassthrough is a fluent interface, so the .assign() method can be called to chain the steps together.

typescript

Mutate State

// Get the rephrased question and generate context
return (
  RunnablePassthrough.assign({
    documents: new RunnablePick("rephrasedQuestion").pipe(
      vectorStoreRetriever
    ),
  })
    .assign({
      // Extract the IDs
      ids: new RunnablePick("documents").pipe(extractDocumentIds),
      // convert documents to string
      context: new RunnablePick("documents").pipe(docsToJson),
    })

The rephrased question and context can then be passed to the answerChain to generate an output.

typescript

Generate an answer

.assign({
  output: (input: RetrievalChainThroughput) =>
    answerChain.invoke({
      question: input.rephrasedQuestion,
      context: input.context,
    }),
})

Then, the input, rephrased question and output can be saved to the database using the saveHistory() function created in Conversation Memory module.

typescript

Generate an answer

.assign({
  responseId: async (input: RetrievalChainThroughput, options) =>
    saveHistory(
      options?.config.configurable.sessionId,
      "vector",
      input.input,
      input.rephrasedQuestion,
      input.output,
      input.ids
    ),
})

Before, finally picking the output as a string.

typescript

Return the output

.assign({
  responseId: async (input: RetrievalChainThroughput, options) =>
    saveHistory(
      options?.config.configurable.sessionId,
      "vector",
      input.input,
      input.rephrasedQuestion,
      input.output,
      input.ids
    ),
})
    .pick("output")
);

If you have followed the instructions correctly, your code should resemble the following:

typescript

Full Function

export default async function initVectorRetrievalChain(
  llm: BaseLanguageModel,
  embeddings: Embeddings
): Promise<Runnable<AgentToolInput, string>> {
  //  Create vector store instance
  const vectorStore = await initVectorStore(embeddings);

  // Initialize a retriever wrapper around the vector store
  const vectorStoreRetriever = vectorStore.asRetriever(5);

  // Initialize  Answer Chain
  const answerChain = initGenerateAnswerChain(llm);

  // Get the rephrased question and generate context
  return (
    RunnablePassthrough.assign({
      documents: new RunnablePick("rephrasedQuestion").pipe(
        vectorStoreRetriever
      ),
    })
      .assign({
        // Extract the IDs
        ids: new RunnablePick("documents").pipe(extractDocumentIds),
        // convert documents to string
        context: new RunnablePick("documents").pipe(docsToJson),
      })
      .assign({
        output: (input: RetrievalChainThroughput) =>
          answerChain.invoke({
            question: input.rephrasedQuestion,
            context: input.context,
          }),
      })
      .assign({
        responseId: async (input: RetrievalChainThroughput, options) =>
          saveHistory(
            options?.config.configurable.sessionId,
            "vector",
            input.input,
            input.rephrasedQuestion,
            input.output,
            input.ids
          ),
      })
      .pick("output")
  );
}

Testing your changes

If you have followed the instructions, you should be able to run the following unit test to verify the response using the npm run test command.

Running the Test

npm run test vector-retrieval.chain.test.ts

View Unit Test

typescript

vector-retrieval.chain.test.ts

import { ChatOpenAI, OpenAIEmbeddings } from "@langchain/openai";
import { config } from "dotenv";
import { BaseChatModel } from "langchain/chat_models/base";
import { Embeddings } from "langchain/embeddings/base";
import { Runnable } from "@langchain/core/runnables";
import initVectorRetrievalChain from "./vector-retrieval.chain";
import { Neo4jGraph } from "@langchain/community/graphs/neo4j_graph";
import { AgentToolInput } from "../agent.types";
import { close } from "../../graph";

describe("Vector Retrieval Chain", () => {
  let graph: Neo4jGraph;
  let llm: BaseChatModel;
  let embeddings: Embeddings;
  let chain: Runnable<AgentToolInput, string>;

  beforeAll(async () => {
    config({ path: ".env.local" });

    graph = await Neo4jGraph.initialize({
      url: process.env.NEO4J_URI as string,
      username: process.env.NEO4J_USERNAME as string,
      password: process.env.NEO4J_PASSWORD as string,
      database: process.env.NEO4J_DATABASE as string | undefined,
    });

    llm = new ChatOpenAI({
      openAIApiKey: process.env.OPENAI_API_KEY,
      modelName: "gpt-3.5-turbo",
      temperature: 0,
      configuration: {
        baseURL: process.env.OPENAI_API_BASE,
      },
    });

    embeddings = new OpenAIEmbeddings({
      openAIApiKey: process.env.OPENAI_API_KEY as string,
      configuration: {
        baseURL: process.env.OPENAI_API_BASE,
      },
    });

    chain = await initVectorRetrievalChain(llm, embeddings);
  });

  afterAll(async () => {
    await graph.close();
    await close();
  });

  it("should provide a recommendation", async () => {
    const sessionId = "vector-retriever-1";
    const input = "[redacted]";
    const rephrasedQuestion = "Recommend a movie about ghosts";

    const output = await chain.invoke(
      {
        input,
        rephrasedQuestion,
      },
      { configurable: { sessionId } }
    );

    // Should generate an answer
    expect(output).toBeDefined();

    // Should save to the database
    const res = await graph.query(
      `
        MATCH (s:Session {id: $sessionId})-[:LAST_RESPONSE]->(r)
        RETURN s.id AS session, r.input AS input, r.output AS output,
          r.rephrasedQuestion AS rephrasedQuestion,
          [ (r)-[:CONTEXT]->(m) | m.title ] AS context
        ORDER BY r.createdAt DESC LIMIT 1
    `,
      { sessionId }
    );

    expect(res).toBeDefined();

    // Should have properties set
    const [first] = res!;

    expect(first.input).toEqual(input);
    expect(first.rephrasedQuestion).toEqual(rephrasedQuestion);
    expect(first.output).toEqual(output);
    expect(first.input).toEqual(input);

    // Should save with context
    expect(first.context.length).toBeGreaterThanOrEqual(1);

    // Any of the movies in the context should be mentioned
    let found = false;

    for (const title of first.context) {
      if (output.includes(title.replace(", The", ""))) {
        found = true;
      }
    }

    expect(found).toBe(true);
  });
});

Verifying the Test

If every test in the test suite has passed, a new (:Session) node with a .id property of vector-retriever-1 will have been created in your database.

The session should have atleast one (:Response) node, linked with a :CONTEXT relationship to at least one movie.

Click the Check Database button below to verify the tests have succeeded.

Hint

You can compare your code with the solution in src/solutions/modules/agent/tools/vector-retrieval.chain.ts and double-check that the conditions have been met in the test suite.

Solution

You can compare your code with the solution in src/solutions/modules/agent/tools/vector-retrieval.chain.ts and double-check that the conditions have been met in the test suite.

You can also run the following Cypher statement to double-check that the index has been created in your database.

cypher

MATCH (s:Session {id: 'vector-retriever-1'})
RETURN s,
    [ (s)-[:HAS_RESPONSE]->(r)
        | [ r,
            [ (r)-[:CONTEXT]->(c) | c ]
        ]
    ]

Once you have verified your code and re-ran the tests, click Try again…* to complete the challenge.

Summary

In this lesson, you built a chain that takes the components built in the course so far to build a chain that retrieves documents from the vector search index and uses them to answer a question.

The chain then saves the response to the database.

In the next lesson, you will see how the response can be used to filter out documents that have been used to provide unhelpful responses in the past.

Build a Neo4j-backed Chatbot with TypeScript

Getting Started

Chains

Conversation History

Retrieval Chain

Cypher Retrieval Tool

Creating an Agent

Creating a Retrieval Chain

Existing function

Instantiate Tools

Building the Chain

Testing your changes

Verifying the Test

Summary

Chatbot