Neo4j Retriever Tool

In the Vectors & Semantic Search module of the Neo4j & LLM Fundamentals course, you learned that unstructured content is often converted to vector embeddings to make them easy to compare and contrast, in an approach called Semantic Search.

In the Retrievers lesson, you also learned how to create an instance of the Neo4jVector Store.

In this challenge, you will use that knowledge to create and register a tool that will use a Vector Search Index created on embeddings of the .plot property of each movie to find similar movies.

You will need to:

  1. Create an instance of a Neo4j Vector Store

  2. Use the Neo4j Vector Store to create a Retriever

  3. Create a Retrieval QA Chain that will handle the user input, create an embedding and use that to find similar documents in the Neo4j Vector Store

  4. Register the Retrieval Chain as a tool in agent.py.

Creating a Neo4j Vector Store

In Langchain, a Vector Store is a special type of store that handles the interaction between the application and a Vector database.

To interact with Neo4j Vector Search Indexes, you must create an instance of the Neo4jVector store.

In the project root, create a new folder called tools. This is where you will store the code for new tools you create.

In the tools/ folder, create a new file called vector.py.

Add the following code to import streamlit, the Neo4jVector class, and the llm and embeddings objects created earlier in the course.

python
Importing the Neo4jVector class
import streamlit as st
from langchain_community.vectorstores.neo4j_vector import Neo4jVector
from llm import llm, embeddings

A plotEmbedding property, containing the vector embeddings of the plot, has been added to the Movie nodes in the database, and a vector index, moviePlots, has been created.

Because the index already exists in the database, you can use the Neo4jVector.from_existing_index static method.

python
Creating a Neo4jVector
neo4jvector = Neo4jVector.from_existing_index(
    embeddings,                              # (1)
    url=st.secrets["NEO4J_URI"],             # (2)
    username=st.secrets["NEO4J_USERNAME"],   # (3)
    password=st.secrets["NEO4J_PASSWORD"],   # (4)
    index_name="moviePlots",                 # (5)
    node_label="Movie",                      # (6)
    text_node_property="plot",               # (7)
    embedding_node_property="plotEmbedding", # (8)
    retrieval_query="""
RETURN
    node.plot AS text,
    score,
    {
        title: node.title,
        directors: [ (person)-[:DIRECTED]->(node) | person.name ],
        actors: [ (person)-[r:ACTED_IN]->(node) | [person.name, r.role] ],
        tmdbId: node.tmdbId,
        source: 'https://www.themoviedb.org/movie/'+ node.tmdbId
    } AS metadata
"""
)

In the above call, the method is passed the following parameters:

  1. The embeddings object that will be used to embed the user input.

  2. The full URI of the Neo4j Instance, as set in .streamlit/secrets.toml.

  3. The username required to connect to the database as set in .streamlit/secrets.toml.

  4. The password required to authenticate the Neo4j user, as set in .streamlit/secrets.toml.

  5. The name of the index. The index created in the Neo4j & LLM Fundamentals course was named moviePlots.

  6. The label of node used to populate the index, in this case, Movie.

  7. The name of the property that holds the original plain-text value, in this case, plot.

  8. The name of the property that holds the embedding of the original text, in this case, plotEmbedding.

Modifying the Retrieval Query

The last parameter passed, retrieval_query, is an optional parameter that allows you to define which information is returned by the Cypher statement, loaded into each Document and subsequently passed to the LLM. This value is appended to the end of the query after the index has been searched, and should always contain a RETURN clause.

The final statement should return a text value and a map of metadata, although what you specify in the metadata is up to you.

By default, this generates a list of properties, but in this example, the parameter is used to return specific information about the Movie node, including a link to the original movie listing on themoviedb.org.

The directors and actors fields provide information about the (:Person) nodes linked to the movie via :DIRECTED and :ACTED_IN relationships.

Vector Creation Options

The Neo4jVector class also holds static methods for creating a new index from a list of documents, or a list of embeddings.

Creating a Retriever

In Langchain applications, Retrievers are classes that are designed to retrieve documents from a Store. Vector Retrievers are a specific type of retriever that are designed to retrieve documents from a Vector Store based on similarity.

All Store instances have an as_retriever() method which returns a Retriever configured to get documents from the store itself.

To create an instance of the Neo4jVectorRetriever, call the as_retriever() method.

python
Creating a Neo4jVector
retriever = neo4jvector.as_retriever()

Retrieval QA Chain

The RetrievalQA chain will be responsible for creating an embedding from the user’s input, calling the Retriever to identify similar documents, and passing them to an LLM to generate a response.

Start by importing the RetrievalQA chain class.

python
Importing the RetrievalQA class
from langchain.chains import RetrievalQA

Call the static .from_llm() method on the RetrievalQA to create a new chain, passing the following parameters:

  1. The LLM that used to process the chain

  2. A Stuff chain is a relatively straightforward chain that stuffs, or inserts, documents into a prompt and passes that prompt to an LLM.

  3. The Chain should use the Neo4jVectorRetriever created in the previous step.

python
Creating a new RetrievalQA Chain
kg_qa = RetrievalQA.from_chain_type(
    llm,                  # (1)
    chain_type="stuff",   # (2)
    retriever=retriever,  # (3)
)

More Complex 'Stuff' Retrieval Chains

We have chosen to use the .from_llm() method here because the .plot properties in the database are relatively short. The function returns an instance with default prompts.

If you find that you hit token limits, can also define chains with custom prompts that summarize the content before sending the information to the LLM.

Registering the Retrieval QA Chain as a Tool

Now the Retrieval QA Chain is ready to register as a tool.

You may recall in the previous module, that you created a "General Chat" tool in agent.py.

Import the kg_qa Retrieval QA Chain from the tools.vector module.

python
Importing the kg_qa chain
from tools.vector import kg_qa

Add the tool to the tools array using the .from_function() static method.

python
Registering the Tool
tools = [
    Tool.from_function(
        name="General Chat",
        description="For general chat not covered by other tools",
        func=llm.invoke,
        return_direct=True
        ),
    Tool.from_function(
        name="Vector Search Index",  # (1)
        description="Provides information about movie plots using Vector Search", # (2)
        func = kg_qa, # (3)
        return_direct=True
    )
]

The @tool Decorator

You can either use the Tool using a static function or annotate the function with the @tool decorator. However, do note that to use the @tool decorator, the function must accept a single str input and return a single str output.

Testing the Tool

To test the tool, ask the bot to list movies with a similar plot to another film. The bot should respond with a message starting with something along the lines of "Based on the information provided…​".

The Bot responds

In this case, the response generated by the LLM also included links to the movie listings returned by the query as part of the metadata map.

In your console, you should see that the Agent has executed the Vector Search Index action with the name of the movie you mentioned.

Executor Chain Output
> Entering new AgentExecutor chain...
{
    "action": "Vector Search Index",
    "action_input": "toy story"
}
Observation: {'question': 'toy story', 'answer': 'Based on the information provided, "Toy Story" is a movie about a cowboy doll who feels threatened and jealous when a new spaceman figure becomes the top toy in a boy\'s room. It is a heartwarming animated film that explores themes of friendship, loyalty, and acceptance. You can find more information about "Toy Story" on its listing on The Movie Database: [Toy Story](https://www.themoviedb.org/movie/862)', 'sources': '', 'source_documents': [Document(page_content="A cowboy doll is profoundly threatened and jealous when a new spaceman figure supplants him as top toy in a boy's room.", metadata={'tmdbId': '862', 'title': 'Toy Story', 'source': 'https://www.themoviedb.org/movie/862', 'actors': [['Jim Varney', 'Slinky Dog (voice)'], ['Tim Allen', 'Buzz Lightyear (voice)'], ['Tom Hanks', 'Woody (voice)'], ['Don Rickles', 'Mr. Potato Head (voice)']], 'directors': ['John Lasseter']}), Document(page_content='A young boy must restore order when a group of bullies steal the magical book that acts as a portal between Earth and the imaginary world of Fantasia.', metadata={'tmdbId': '27793', 'title': 'NeverEnding Story III, The', 'source': 'https://www.themoviedb.org/movie/27793', 'actors': [['Jack Black', 'Slip'], ['Melody Kay', 'Nicole'], ['Carole Finn', 'Mookie'], ['Jason James Richter', 'Bastian Bux']], 'directors': ['Peter MacDonald']}), Document(page_content='A troubled child summons the courage to help a friendly alien escape Earth and return to his home-world.', metadata={'tmdbId': '601', 'title': 'E.T. the Extra-Terrestrial', 'source': 'https://www.themoviedb.org/movie/601', 'actors': [['Peter Coyote', 'Keys'], ['Robert MacNaughton', 'Michael'], ['Henry Thomas', 'Elliott'], ['Dee Wallace', 'Mary']], 'directors': ['Steven Spielberg']}), Document(page_content='When a boy learns that a beloved killer whale is to be killed by the aquarium owners, the boy risks everything to free the whale.', metadata={'tmdbId': '1634', 'title': 'Free Willy', 'source': 'https://www.themoviedb.org/movie/1634', 'actors': [['Lori Petty', 'Rae Lindley'], ['Jayne Atkinson', 'Annie Greenwood'], ['August Schellenberg', 'Randolph Johnson'], ['Jason James Richter', 'Jesse']], 'directors': ['Simon Wincer']})]}
Thought:{
    "action": "Final Answer",
    "action_input": "Based on the information provided, \"Toy Story\" has a similar plot to movies like \"NeverEnding Story III\", \"E.T. the Extra-Terrestrial\", and \"Free Willy\". These movies also involve themes of friendship, adventure, and overcoming obstacles. You can find more information about these movies on their respective listings on The Movie Database: [NeverEnding Story III](https://www.themoviedb.org/movie/27793), [E.T. the Extra-Terrestrial](https://www.themoviedb.org/movie/601), and [Free Willy](https://www.themoviedb.org/movie/1634)."
}
> Finished chain.

Once you have tested the bot, click the button below to mark the challenge as completed.

Summary

In this lesson, you added a new tool that uses the Vector Search Index to identify movies with similar plots to the user’s input.

In the next lesson, you will create a tool that uses the LLM to generate a Cypher statement and executes it against the database.

Chatbot

Hi, I am an Educational Learning Assistant for Intelligent Network Exploration. You can call me E.L.A.I.N.E.

How can I help you today?