Neo4j Retriever Tool

In the Vectors & Semantic Search module of the Neo4j & LLM Fundamentals course, you learned that unstructured content is often converted to vector embeddings to make them easy to compare and contrast, in an approach called Semantic Search.

In the Retrievers lesson, you also learned how to create an instance of the Neo4jVector Store.

In this challenge, you will use that knowledge to create and register a tool that will use a Vector Search Index created on embeddings of the .plot property of each movie to find similar movies.

You will need to:

  1. Create an instance of a Neo4j Vector Store

  2. Use the Neo4j Vector Store to create a retriever

  3. Create a retriever chain that will handle the user input, create an embedding, and use that to find similar documents in the Neo4j Vector Store

  4. Register the retriever chain as a tool in agent.py.

Creating a Neo4j Vector Store

In Langchain, a Vector Store handles the interaction between the application and a Vector database.

To interact with Neo4j Vector Search Indexes, you must create an instance of the Neo4jVector store.

Open the tools/vector.py file in the tools directory. The tools directory is where you will store the code for new tools you create.

python
tools/vector.py
import streamlit as st
from llm import llm, embeddings
from graph import graph

# Create the Neo4jVector

# Create the retriever

# Create the prompt

# Create the chain 

# Create a function to call the chain

The streamlit library and the llm, embeddings, and graph objects you created are already imported.

A plotEmbedding property, containing the vector embeddings of the plot, has been added to the Movie nodes in the database, and a vector index, moviePlots, has been created.

View the plot embedding

Run the following Cypher to view the plotEmbedding property of the movie "Toy Story".

cypher
MATCH (m:Movie {title: "Toy Story"})
RETURN m.plot, m.plotEmbedding

As the index already exists in the database, you can use the Neo4jVector.from_existing_index static method.

python
Creating a Neo4jVector
from langchain_community.vectorstores.neo4j_vector import Neo4jVector

neo4jvector = Neo4jVector.from_existing_index(
    embeddings,                              # (1)
    graph=graph,                             # (2)
    index_name="moviePlots",                 # (3)
    node_label="Movie",                      # (4)
    text_node_property="plot",               # (5)
    embedding_node_property="plotEmbedding", # (6)
    retrieval_query="""
RETURN
    node.plot AS text,
    score,
    {
        title: node.title,
        directors: [ (person)-[:DIRECTED]->(node) | person.name ],
        actors: [ (person)-[r:ACTED_IN]->(node) | [person.name, r.role] ],
        tmdbId: node.tmdbId,
        source: 'https://www.themoviedb.org/movie/'+ node.tmdbId
    } AS metadata
"""
)

In the above call, the method is passed the following parameters:

  1. The embeddings object to embed the user input.

  2. The graph object to interact with the database.

  3. The name of the index, in this case moviePlots

  4. The label of node used to populate the index, in this case, Movie.

  5. The name of the property that holds the original plain-text value, in this case, plot.

  6. The name of the property that holds the embedding of the original text, in this case, plotEmbedding.

Modifying the Retrieval Query

The last parameter passed, retrieval_query, is an optional parameter that allows you to define which information is returned by the Cypher statement, loaded into each Document and subsequently passed to the LLM. This value is appended to the end of the query after searching the index, and should always contain a RETURN clause.

The final statement should return a text value and a map of metadata, although what you specify in the metadata is up to you.

By default, this generates a list of properties, in this example, the parameter returns specific information about the Movie node, including a link to the original movie listing on themoviedb.org.

The directors and actors fields provide information about the (:Person) nodes linked to the movie via :DIRECTED and :ACTED_IN relationships.

Vector Creation Options

The Neo4jVector class also holds static methods for creating a new index from a list of documents, or a list of embeddings.

Creating a Retriever

In Langchain applications, you can use Retrievers classes to retrieve documents from a Store. Vector Retrievers are a specific type of retriever that retrieve documents from a Vector Store based on similarity.

All store instances have an as_retriever() method, which returns a retriever configured to get documents from the store.

To create an instance of the Neo4jVectorRetriever, call the as_retriever() method.

python
Creating a Neo4jVector
retriever = neo4jvector.as_retriever()

Retrieval Chain

The retrieval chain creates an embedding from the user’s input, calls the retriever to identify similar documents, and passes them to an LLM to generate a response.

The chain will need a prompt that accepts the documents as {context} and the user input as {input}:

python
Create the prompt
from langchain_core.prompts import ChatPromptTemplate

instructions = (
    "Use the given context to answer the question."
    "If you don't know the answer, say you don't know."
    "Context: {context}"
)

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", instructions),
        ("human", "{input}"),
    ]
)

Create a retrieval chain that uses the llm, prompt, and retriever objects:

python
Create the chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains import create_retrieval_chain

question_answer_chain = create_stuff_documents_chain(llm, prompt)
plot_retriever = create_retrieval_chain(
    retriever, 
    question_answer_chain
)

The code first creates a QA (question/answer) chain using create_stuff_documents_chain. A Stuff chain is a relatively straightforward chain that stuffs, or inserts, documents into a prompt and passes that prompt to an LLM.

The retrieval chain is then created from the retriever and QA chain using create_retrieval_chain.

More Complex 'Stuff' Retrieval Chains

You can modify the prompt to include more information about how to process the documents. For example, if you were hitting token limits, you could add a step to summarize the content before sending it to the LLM.

Finally, you must add a function that can be used as a tool and invokes the chain when called.

python
def get_movie_plot(input):
    return plot_retriever.invoke({"input": input})
View the complete code
python
tools/vector.py
import streamlit as st
from llm import llm, embeddings
from graph import graph

from langchain_community.vectorstores.neo4j_vector import Neo4jVector
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains import create_retrieval_chain

from langchain_core.prompts import ChatPromptTemplate


neo4jvector = Neo4jVector.from_existing_index(
    embeddings,                              # (1)
    graph=graph,                             # (2)
    index_name="moviePlots",                 # (3)
    node_label="Movie",                      # (4)
    text_node_property="plot",               # (5)
    embedding_node_property="plotEmbedding", # (6)
    retrieval_query="""
RETURN
    node.plot AS text,
    score,
    {
        title: node.title,
        directors: [ (person)-[:DIRECTED]->(node) | person.name ],
        actors: [ (person)-[r:ACTED_IN]->(node) | [person.name, r.role] ],
        tmdbId: node.tmdbId,
        source: 'https://www.themoviedb.org/movie/'+ node.tmdbId
    } AS metadata
"""
)

retriever = neo4jvector.as_retriever()

instructions = (
    "Use the given context to answer the question."
    "If you don't know the answer, say you don't know."
    "Context: {context}"
)

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", instructions),
        ("human", "{input}"),
    ]
)

question_answer_chain = create_stuff_documents_chain(llm, prompt)
plot_retriever = create_retrieval_chain(
    retriever, 
    question_answer_chain
)

def get_movie_plot(input):
    return plot_retriever.invoke({"input": input})

Registering the Retriever as a Tool

You can now use the retrieval chain as a tool in your agent.

This tool will be in addition to the "General Chat" tool you created in the previous module.

Open the agent.py file and import the get_movie_plot function from the tools.vector module:``

python
agent.py
from tools.vector import get_movie_plot

Add the get_movie_plot function to the tools array:

python
tools = [
    Tool.from_function(
        name="General Chat",
        description="For general movie chat not covered by other tools",
        func=movie_chat.invoke,
    ), 
    Tool.from_function(
        name="Movie Plot Search",  
        description="For when you need to find information about movies based on a plot",
        func=get_movie_plot, 
    )
]

The agent will use the tool’s name and description to identify what tool to use.

View the complete code
python
agent.py
from llm import llm
from graph import graph
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.prompts import PromptTemplate
from langchain.schema import StrOutputParser
from langchain.tools import Tool
from langchain_community.chat_message_histories import Neo4jChatMessageHistory
from langchain.agents import AgentExecutor, create_react_agent
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain import hub
from utils import get_session_id

from tools.vector import get_movie_plot

chat_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", "You are a movie expert providing information about movies."),
        ("human", "{input}"),
    ]
)

movie_chat = chat_prompt | llm | StrOutputParser()

tools = [
    Tool.from_function(
        name="General Chat",
        description="For general movie chat not covered by other tools",
        func=movie_chat.invoke,
    ), 
    Tool.from_function(
        name="Movie Plot Search",  
        description="For when you need to find information about movies based on a plot",
        func=get_movie_plot, 
    )
]

def get_memory(session_id):
    return Neo4jChatMessageHistory(session_id=session_id, graph=graph)

agent_prompt = PromptTemplate.from_template("""
You are a movie expert providing information about movies.
Be as helpful as possible and return as much information as possible.
Do not answer any questions that do not relate to movies, actors or directors.

Do not answer any questions using your pre-trained knowledge, only use the information provided in the context.

TOOLS:
------

You have access to the following tools:

{tools}

To use a tool, please use the following format:

```
Thought: Do I need to use a tool? Yes
Action: the action to take, should be one of [{tool_names}]
Action Input: the input to the action
Observation: the result of the action
```

When you have a response to say to the Human, or if you do not need to use a tool, you MUST use the format:

```
Thought: Do I need to use a tool? No
Final Answer: [your response here]
```

Begin!

Previous conversation history:
{chat_history}

New input: {input}
{agent_scratchpad}
""")

agent = create_react_agent(llm, tools, agent_prompt)
agent_executor = AgentExecutor(
    agent=agent,
    tools=tools,
    verbose=True
    )

chat_agent = RunnableWithMessageHistory(
    agent_executor,
    get_memory,
    input_messages_key="input",
    history_messages_key="chat_history",
)

def generate_response(user_input):
    """
    Create a handler that calls the Conversational agent
    and returns a response to be rendered in the UI
    """

    response = chat_agent.invoke(
        {"input": user_input},
        {"configurable": {"session_id": get_session_id()}},)

    return response['output']

The @tool Decorator

You can create the tool using the static function Tool.from_function or annotate the function with the @tool decorator. However, note that to use the @tool decorator, the function must accept a single str input and return a single str output.

Testing the Tool

To test the tool, ask the bot to find movies with a particular plot. The bot should respond with a message starting with a film based on that plot.

The Bot responds

In this case, the response generated by the LLM also included links to the movie listings returned by the query as part of the metadata map.

In your console, you should see that the Agent has executed the Movie Plot Search action with the query.

Executor Chain Output
> Entering new AgentExecutor chain...
Thought: The user is asking for a movie with a specific plot. I need to use a tool to find this information.
Action: Movie Plot Search
Action Input: Toys coming to life{'input': 'Toys coming to life', 'context': [Document(page_content="A cowboy doll is profoundly threatened and jealous when a new spaceman figure supplants him as top toy in a boy's room.", metadata={'title': 'Toy Story', 'tmdbId': '862', 'source': 'https://www.themoviedb.org/movie/862', 'actors': [['Jim Varney', 'Slinky Dog (voice)'], ['Tim Allen', 'Buzz Lightyear (voice)'], ['Tom Hanks', 'Woody (voice)'], ['Don Rickles', 'Mr. Potato Head (voice)']], 'directors': ['John Lasseter']}), Document(page_content="One of puppet-maker Geppetto's creations comes magically to life. This puppet, Pinocchio, has one major desire and that is to become a real boy someday. In order to accomplish this goal he ...", metadata={'title': 'Adventures of Pinocchio, The', 'tmdbId': '18975', 'source': 'https://www.themoviedb.org/movie/18975', 'actors': [['Martin Landau', 'Geppetto'], ['Udo Kier', 'Lorenzini'], ['Geneviève Bujold', 'Leona'], ['Jonathan Taylor Thomas', 'Pinocchio']], 'directors': ['Steve Barron']}), Document(page_content='A living puppet, with the help of a cricket as his conscience, must prove himself worthy to become a real boy.', metadata={'title': 'Pinocchio', 'tmdbId': '10895', 'source': 'https://www.themoviedb.org/movie/10895', 'actors': [['Mel Blanc', 'Gideon (hiccup) (voice) (uncredited)'], ['Don Brodie', 'Carnival Barker (voice) (uncredited)'], ['Walter Catlett', "'Honest John' Worthington Foulfellow (voice) (uncredited)"], ['Marion Darlington', 'Birds (voice) (uncredited)']], 'directors': ['Ben Sharpsteen', 'Hamilton Luske', ' T. Hee', 'Norman Ferguson']}), Document(page_content='A boy born the size of a small doll is kidnapped by a genetic lab and must find a way back to his father in this inventive adventure filmed using stop motion animation techniques. Tom meets...', metadata={'title': 'Secret Adventures of Tom Thumb, The', 'tmdbId': '18242', 'source': 'https://www.themoviedb.org/movie/18242', 'actors': [['Nick Upton', 'Pa Thumb'], ['Deborah Collard', 'Ma Thumb'], ['Frank Passingham', 'Man']], 'directors': ['Dave Borthwick']})], 'answer': 'No, the context does not discuss toys coming to life.'}The tool has provided several movies where toys come to life, including Toy Story, The Adventures of Pinocchio, Pinocchio, and The Secret Adventures of Tom Thumb.
Final Answer: Here are some movies where toys come to life:
1. "Toy Story" - A cowboy doll is profoundly threatened and jealous when a new spaceman figure supplants him as top toy in a boy's room. [More Info](https://www.themoviedb.org/movie/862)
2. "The Adventures of Pinocchio" - One of puppet-maker Geppetto's creations comes magically to life. This puppet, Pinocchio, has one major desire and that is to become a real boy someday. [More Info](https://www.themoviedb.org/movie/18975)
3. "Pinocchio" - A living puppet, with the help of a cricket as his conscience, must prove himself worthy to become a real boy. [More Info](https://www.themoviedb.org/movie/10895)
4. "The Secret Adventures of Tom Thumb" - A boy born the size of a small doll is kidnapped by a genetic lab and must find a way back to his father. [More Info](https://www.themoviedb.org/movie/18242)
> Finished chain.

Once you have tested the bot, click the button below to mark the challenge as completed.

Summary

In this lesson, you added a new tool that uses the Vector Search Index to identify movies with similar plots to the user’s input.

In the next lesson, you will create a tool that uses the LLM to generate a Cypher statement and execute it against the database.