Working with Vector Stores

In the Answer Generation Chain lesson, you built a chain that answers a question based on the context provided in the prompt.

As we covered in the Retrievers lesson of Neo4j & LLM Fundamentals, semantic search in LangChain is performed using an object called a Retriever.

A Retriever is an abstraction that uses a Vector Store to identify similar documents based on an input by converting the input into a vector embedding and performing a similarity search against the vectors stored in an index.

To pass this challenge, you must modify the initVectorStore() function in modules/agent/vector.store.ts to create a new Neo4jVectorStore instance.

Open vector.store.ts

Set up the Vector Index

First, you must create a vector index in your Sandbox instance to use a Vector Store.

Run the CREATE VECTOR INDEX command below to create a vector index called moviePlots if it does not already exist.

cypher
Create Vector Index
CREATE VECTOR INDEX `moviePlots` IF NOT EXISTS
FOR (n: Movie) ON (n.embedding)
OPTIONS {indexConfig: {
 `vector.dimensions`: 1536,
 `vector.similarity_function`: 'cosine'
}};

The statement creates a new index called moviePlots, indexing the vectors in the embedding property. The vectors stored in the embedding property have been created using the text-embedding-ada-002 model and therefore have 1536 dimensions. The index will use cosine similarity to identify similar documents.

To learn more about how Vector Retrievers work, see the Retrievers lesson in Neo4j & LLM Fundamentals.

Next, run the following statement to load a CSV file containing embeddings of movie plots.

cypher
Create Vector Index
LOAD CSV WITH HEADERS
FROM 'https://data.neo4j.com/llm-fundamentals/openai-embeddings.csv'
AS row
MATCH (m:Movie {movieId: row.movieId})
CALL db.create.setNodeVectorProperty(m, 'embedding', apoc.convert.fromJsonList(row.embedding))
RETURN count(*);

Creating a Store

Inside modules/agent/vector.store.ts, you will find an initVectorStore() function.

typescript
initVectorStore
export default async function initVectorStore(
  embeddings: EmbeddingsInterface
): Promise<Neo4jVectorStore> {
  // TODO: Create vector store
  // const vectorStore = await Neo4jVectorStore.fromExistingIndex(embeddings, { ... })
  // return vectorStore
}

Inside this function, use the Neo4jVectorStore.fromExistingIndex() method to create a new vector store instance.

typescript
Using an existing index
const vectorStore = await Neo4jVectorStore.fromExistingIndex(embeddings, {
  url: process.env.NEO4J_URI as string,
  username: process.env.NEO4J_USERNAME as string,
  password: process.env.NEO4J_PASSWORD as string,
  indexName: "moviePlots",
  textNodeProperty: "plot",
  embeddingNodeProperty: "embedding",
  retrievalQuery: `
    RETURN
      node.plot AS text,
      score,
      {
        _id: elementid(node),
        title: node.title,
        directors: [ (person)-[:DIRECTED]->(node) | person.name ],
        actors: [ (person)-[r:ACTED_IN]->(node) | [person.name, r.role] ],
        tmdbId: node.tmdbId,
        source: 'https://www.themoviedb.org/movie/'+ node.tmdbId
      } AS metadata
  `,
});

Document Metadata

You may have noticed the retrievalQuery argument defined when creating the vectorStore variable. The metadata object allows you to return additional information that could help improve the LLM response.

In this case, the title is returned with the names of actors and directors and a canonical link to the movie on The Movie Database (TMDB).

The _id property will contain the Element ID for each source document in the database. You will use these IDs to create relationships that provide transparency on the context provided to help the LLM generate its response.

Finally, return the vectorStore from the function.

typescript
Returning the vector store
return vectorStore;

If you have followed the steps correctly, your code should resemble the following:

typescript
Returning the vector store
export default async function initVectorStore(
  embeddings: EmbeddingsInterface
): Promise<Neo4jVectorStore> {
  const vectorStore = await Neo4jVectorStore.fromExistingIndex(embeddings, {
    url: process.env.NEO4J_URI as string,
    username: process.env.NEO4J_USERNAME as string,
    password: process.env.NEO4J_PASSWORD as string,
    indexName: "moviePlots",
    textNodeProperty: "plot",
    embeddingNodeProperty: "embedding",
    retrievalQuery: `
      RETURN
        node.plot AS text,
        score,
        {
          _id: elementid(node),
          title: node.title,
          directors: [ (person)-[:DIRECTED]->(node) | person.name ],
          actors: [ (person)-[r:ACTED_IN]->(node) | [person.name, r.role] ],
          tmdbId: node.tmdbId,
          source: 'https://www.themoviedb.org/movie/'+ node.tmdbId
        } AS metadata
    `,
  });

  return vectorStore;
}

Testing your changes

If you have followed the instructions, you should be able to run the following unit test to verify the response using the npm run test command.

sh
Running the Test
npm run test vector.store.test.ts
View Unit Test
typescript
vector.store.test.ts
import { OpenAIEmbeddings } from "@langchain/openai";
import initVectorStore from "./vector.store";
import { Neo4jVectorStore } from "@langchain/community/vectorstores/neo4j_vector";
import { close } from "../graph";

describe("Vector Store", () => {
  afterAll(() => close());

  it("should instantiate a new vector store", async () => {
    const embeddings = new OpenAIEmbeddings({
      openAIApiKey: process.env.OPENAI_API_KEY as string,
      configuration: {
        baseURL: process.env.OPENAI_API_BASE,
      },
    });
    const vectorStore = await initVectorStore(embeddings);
    expect(vectorStore).toBeInstanceOf(Neo4jVectorStore);

    await vectorStore.close();
  });

  it("should create a test index", async () => {
    const indexName = "test-index";
    const embeddings = new OpenAIEmbeddings({
      openAIApiKey: process.env.OPENAI_API_KEY as string,
      configuration: {
        baseURL: process.env.OPENAI_API_BASE,
      },
    });

    const index = await Neo4jVectorStore.fromTexts(
      ["Neo4j GraphAcademy offers free, self-paced online training"],
      [],
      embeddings,
      {
        url: process.env.NEO4J_URI as string,
        username: process.env.NEO4J_USERNAME as string,
        password: process.env.NEO4J_PASSWORD as string,
        nodeLabel: "Test",
        embeddingNodeProperty: "embedding",
        textNodeProperty: "text",
        indexName,
      }
    );

    expect(index).toBeInstanceOf(Neo4jVectorStore);
    expect(index["indexName"]).toBe(indexName);

    await index.close();
  });
});

Verifying the Test

If every test in the test suite has passed, a new test-index vector index will be created in your database.

Click the Check Database button below to verify the tests have succeeded.

Hint

You can compare your code with the solution in src/solutions/modules/agent/vector.store.ts and double-check that the conditions have been met in the test suite.

Solution

You can compare your code with the solution in src/solutions/modules/agent/vector.store.ts and double-check that the conditions have been met in the test suite.

You can also run the following Cypher statement to double-check that the index has been created in your database.

cypher
SHOW INDEXES WHERE type = 'VECTOR'

Once you have verified your code and re-ran the tests, click Try again…​* to complete the challenge.

Summary

In this lesson, you wrote the code to save and retrieve conversation history in a Neo4j database.

In the next lesson, you will construct a chain that will take this history to rephrase the user’s input into a standalone question.