Retrievers

Retrievers are Langchain chain components that allow you to retrieve documents using an unstructured query.

Find a movie plot about a robot that wants to be human.

Documents are any unstructured text that you want to retrieve. A retriever often uses a vector store as its underlying data structure.

Retrievers are a key component for creating models that can take advantage of Retrieval Augmented Generation (RAG).

Previously, you loaded embeddings and created a vector index of Movie plots - in this example, the movie plots are the documents, and a retriever could be used to give a model context.

In this lesson, you will create a retriever to retrieve documents from the movie plots vector index.

Neo4jVector

The Neo4jVector is a Langchain vector store that uses a Neo4j database as the underlying data structure.

You can use the Neo4jVector to generate embeddings, store them in the database and retrieve them.

Querying a vector index

Review the following code that creates a Neo4jVector from the moviePlots index you created.

python
import os
from langchain_openai import OpenAIEmbeddings
from langchain_neo4j import Neo4jGraph, Neo4jVector

embedding_provider = OpenAIEmbeddings(
    openai_api_key=os.getenv("OPENAI_API_KEY")
)

graph = Neo4jGraph(
    url=os.getenv("NEO4J_URI"),
    username=os.getenv("NEO4J_USERNAME"),
    password=os.getenv("NEO4J_PASSWORD")
)

movie_plot_vector = Neo4jVector.from_existing_index(
    embedding_provider,
    graph=graph,
    index_name="moviePlots",
    embedding_node_property="plotEmbedding",
    text_node_property="plot",
)

result = movie_plot_vector.similarity_search("A movie where aliens land and attack earth.")
for doc in result:
    print(doc.metadata["title"], "-", doc.page_content)

You should be able to identify the following:

  • That an embedding_provider is required. In this case, OpenAIEmbeddings created the embeddings for the movie plots. The embedding provider will also generate embeddings for any queries.

  • The connection to the Neo4j database (graph).

  • The name of the Neo4j index ("moviePlots").

  • The name of the node property that contains the embeddings ("plotEmbedding").

  • The name of the node property that contains the text ("plot").

  • The similarity_search() method is used to retrieve documents. The first argument is the query.

To run this program, you will need to:

  1. Replace the openai_api_key with your OpenAI API key

  2. Update Neo4j connection details with your Sandbox connection details.

    Click to reveal your Sandbox connection details
    Connection URL

    bolt://{sandbox-ip}:{sandbox-boltPort}

    Username

    {sandbox-username}

    Password

    {sandbox-password}

Run the code and review the results. Try different queries and see what results you get.

The similarity_search() method returns a list of Document objects. The Document object includes the properties:

  • page_content - the content referenced by the index, in this example the plot of the movie

  • meta_data - a dictionary of the Movie node properties returned by the index

Specify the number of documents

You can pass an optional k argument to the similarity_search() method to specify the number of documents to return. The default is 4.

python
vector.similarity_search(query, k=1)

Creating a new vector index

The Neo4jVector class can also generate embeddings and vector indexes - this is useful when creating vectors programmatically or at run time.

The following code would create embeddings and a new index called myVectorIndex in the database for Chunk nodes with a text property:

python
import os
from langchain_openai import OpenAIEmbeddings
from langchain_neo4j import Neo4jGraph, Neo4jVector
from langchain.schema import Document

# A list of Documents
documents = [
    Document(
        page_content="Text to be indexed",
        metadata={"source": "local"}
    )
]

# Service used to create the embeddings
embedding_provider = OpenAIEmbeddings(
    openai_api_key=os.getenv("OPENAI_API_KEY")
)

graph = Neo4jGraph(
    url=os.getenv("NEO4J_URI"),
    username=os.getenv("NEO4J_USERNAME"),
    password=os.getenv("NEO4J_PASSWORD")
)

new_vector = Neo4jVector.from_documents(
    documents,
    embedding_provider,
    graph=graph,
    index_name="myVectorIndex",
    node_label="Chunk",
    text_node_property="text",
    embedding_node_property="embedding",
    create_id_index=True,
)

If you would like to know more about creating vectors for unstructured data and documents in Neo4j, check out the GraphAcademy course Introduction to Vector Indexes and Unstructured Data.

Creating a Retriever chain

To incorporate a retriever and Neo4j vector into a Langchain application, you can create a retrieval chain.

The Neo4jVector class has a as_retriever() method that returns a retriever.

The RetrievalQA class is a chain that uses a retriever as part of its pipeline. It will use the retriever to retrieve documents and pass them to a language model.

By incorporating Neo4jVector into a RetrievalQA chain, you can use data and vectors in Neo4j in a Langchain application.

Review this program incorporating the moviePlots vector index into a retrieval chain.

python
import os
from langchain.chains import RetrievalQA
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_neo4j import Neo4jGraph, Neo4jVector

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

llm = ChatOpenAI(
    openai_api_key=OPENAI_API_KEY
)

embedding_provider = OpenAIEmbeddings(
    openai_api_key=OPENAI_API_KEY
)

graph = Neo4jGraph(
    url=os.getenv("NEO4J_URI"),
    username=os.getenv("NEO4J_USERNAME"),
    password=os.getenv("NEO4J_PASSWORD")
)

movie_plot_vector = Neo4jVector.from_existing_index(
    embedding_provider,
    graph=graph,
    index_name="moviePlots",
    embedding_node_property="plotEmbedding",
    text_node_property="plot",
)

plot_retriever = RetrievalQA.from_llm(
    llm=llm,
    retriever=movie_plot_vector.as_retriever()
)

response = plot_retriever.invoke(
    {"query": "A movie where a mission to the moon goes wrong"}
)

print(response)

When the program runs, the RetrievalQA chain will use the movie_plot_vector retriever to retrieve documents from the moviePlots index and pass them to the chat_llm language model.

Understanding the results

It can be difficult to understand how the model generated the response and how the retriever affected it.

By setting the optional verbose and return_source_documents arguments to True when creating the RetrievalQA chain, you can see the source documents and the retriever’s score for each document.

python
plot_retriever = RetrievalQA.from_llm(
    llm=chat_llm,
    retriever=movie_plot_vector.as_retriever(),
    verbose=True,
    return_source_documents=True
)

Retrievers and vector indexes allow you to incorporate unstructured data into your Langchain applications.

Check Your Understanding

Retrievers

True or False - A retriever can use a vector store to find documents similar to a query.

  • ✓ True

  • ❏ False

Hint

Documents are any unstructured text that you want to retrieve.

Solution

The statement is True. A retriever can use a vector store to find documents similar to a query.

Summary

In this lesson, you learned how to incorporate Neo4j vector indexes and retrievers into Langchain applications.

In the next optional challenge, you will add the movie plots vector retriever to the chat agent you created in the previous lesson.