Retrievers are Langchain chain components that allow you to retrieve documents using an unstructured query.
Find a movie plot about a robot that wants to be human.
Documents are any unstructured text that you want to retrieve. A retriever often uses a vector store as its underlying data structure.
Retrievers are a key component for creating models that can take advantage of Retrieval Augmented Generation (RAG).
Previously, you loaded embeddings and created a vector index of Movie plots - in this example, the movie plots are the documents, and a retriever could be used to give a model context.
In this lesson, you will create a retriever to retrieve documents from the movie plots vector index.
Neo4jVector
The Neo4jVector
is a Langchain vector store that uses a Neo4j database as the underlying data structure.
You can use the Neo4jVector
to generate embeddings, store them in the database and retrieve them.
Querying a vector index
Review the following code that creates a Neo4jVector
from the moviePlots
index you created.
import os
from langchain_openai import OpenAIEmbeddings
from langchain_neo4j import Neo4jGraph, Neo4jVector
embedding_provider = OpenAIEmbeddings(
openai_api_key=os.getenv("OPENAI_API_KEY")
)
graph = Neo4jGraph(
url=os.getenv("NEO4J_URI"),
username=os.getenv("NEO4J_USERNAME"),
password=os.getenv("NEO4J_PASSWORD")
)
movie_plot_vector = Neo4jVector.from_existing_index(
embedding_provider,
graph=graph,
index_name="moviePlots",
embedding_node_property="plotEmbedding",
text_node_property="plot",
)
result = movie_plot_vector.similarity_search("A movie where aliens land and attack earth.")
for doc in result:
print(doc.metadata["title"], "-", doc.page_content)
You should be able to identify the following:
-
That an
embedding_provider
is required. In this case,OpenAIEmbeddings
created the embeddings for the movie plots. The embedding provider will also generate embeddings for any queries. -
The connection to the Neo4j database (
graph
). -
The name of the Neo4j index (
"moviePlots"
). -
The name of the node property that contains the embeddings (
"plotEmbedding"
). -
The name of the node property that contains the text (
"plot"
). -
The
similarity_search()
method is used to retrieve documents. The first argument is the query.
To run this program, you will need to:
-
Replace the
openai_api_key
with your OpenAI API key -
Update Neo4j connection details with your Sandbox connection details.
Click to reveal your Sandbox connection details
- Connection URL
-
bolt://{sandbox-ip}:{sandbox-boltPort}
- Username
-
{sandbox-username}
- Password
-
{sandbox-password}
Run the code and review the results. Try different queries and see what results you get.
The similarity_search()
method returns a list of Document
objects.
The Document
object includes the properties:
-
page_content
- the content referenced by the index, in this example the plot of the movie -
meta_data
- a dictionary of theMovie
node properties returned by the index
Specify the number of documents
You can pass an optional k
argument to the similarity_search()
method to specify the number of documents to return. The default is 4.
vector.similarity_search(query, k=1)
Creating a new vector index
The Neo4jVector
class can also generate embeddings and vector indexes - this is useful when creating vectors programmatically or at run time.
The following code would create embeddings and a new index called myVectorIndex
in the database for Chunk
nodes with a text
property:
import os
from langchain_openai import OpenAIEmbeddings
from langchain_neo4j import Neo4jGraph, Neo4jVector
from langchain.schema import Document
# A list of Documents
documents = [
Document(
page_content="Text to be indexed",
metadata={"source": "local"}
)
]
# Service used to create the embeddings
embedding_provider = OpenAIEmbeddings(
openai_api_key=os.getenv("OPENAI_API_KEY")
)
graph = Neo4jGraph(
url=os.getenv("NEO4J_URI"),
username=os.getenv("NEO4J_USERNAME"),
password=os.getenv("NEO4J_PASSWORD")
)
new_vector = Neo4jVector.from_documents(
documents,
embedding_provider,
graph=graph,
index_name="myVectorIndex",
node_label="Chunk",
text_node_property="text",
embedding_node_property="embedding",
create_id_index=True,
)
If you would like to know more about creating vectors for unstructured data and documents in Neo4j, check out the GraphAcademy course Introduction to Vector Indexes and Unstructured Data.
Creating a Retriever chain
To incorporate a retriever and Neo4j vector into a Langchain application, you can create a retrieval chain.
The Neo4jVector
class has a as_retriever()
method that returns a retriever.
The RetrievalQA
class is a chain that uses a retriever as part of its pipeline. It will use the retriever to retrieve documents and pass them to a language model.
By incorporating Neo4jVector
into a RetrievalQA
chain, you can use data and vectors in Neo4j in a Langchain application.
Review this program incorporating the moviePlots
vector index into a retrieval chain.
import os
from langchain.chains import RetrievalQA
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_neo4j import Neo4jGraph, Neo4jVector
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
llm = ChatOpenAI(
openai_api_key=OPENAI_API_KEY
)
embedding_provider = OpenAIEmbeddings(
openai_api_key=OPENAI_API_KEY
)
graph = Neo4jGraph(
url=os.getenv("NEO4J_URI"),
username=os.getenv("NEO4J_USERNAME"),
password=os.getenv("NEO4J_PASSWORD")
)
movie_plot_vector = Neo4jVector.from_existing_index(
embedding_provider,
graph=graph,
index_name="moviePlots",
embedding_node_property="plotEmbedding",
text_node_property="plot",
)
plot_retriever = RetrievalQA.from_llm(
llm=llm,
retriever=movie_plot_vector.as_retriever()
)
response = plot_retriever.invoke(
{"query": "A movie where a mission to the moon goes wrong"}
)
print(response)
When the program runs, the RetrievalQA
chain will use the movie_plot_vector
retriever to retrieve documents from the moviePlots
index and pass them to the chat_llm
language model.
Understanding the results
It can be difficult to understand how the model generated the response and how the retriever affected it.
By setting the optional verbose
and return_source_documents
arguments to True
when creating the RetrievalQA
chain, you can see the source documents and the retriever’s score for each document.
plot_retriever = RetrievalQA.from_llm(
llm=chat_llm,
retriever=movie_plot_vector.as_retriever(),
verbose=True,
return_source_documents=True
)
Retrievers and vector indexes allow you to incorporate unstructured data into your Langchain applications.
Check Your Understanding
Retrievers
True or False - A retriever can use a vector store to find documents similar to a query.
-
✓ True
-
❏ False
Hint
Documents are any unstructured text that you want to retrieve.
Solution
The statement is True. A retriever can use a vector store to find documents similar to a query.
Summary
In this lesson, you learned how to incorporate Neo4j vector indexes and retrievers into Langchain applications.
In the next optional challenge, you will add the movie plots vector retriever to the chat agent you created in the previous lesson.