Retrievers are LangChain chain components that allow you to retrieve documents using an unstructured query.
Find a movie plot about a robot that wants to be human.
Documents are any unstructured text that you want to retrieve. A retriever often uses a vector store as its underlying data structure.
Retrievers are a key component for creating models that can take advantage of Retrieval Augmented Generation (RAG).
In the previous workshop, you used a vector index of Movie plots - in this example, the movie plots are the documents, and you can use a retriever could to give a model context.
Neo4jVector
The Neo4jVector
is a LangChain vector store that uses a Neo4j database as the underlying data structure.
You used the Neo4jVector
to generate embeddings, store them in the database, and retrieve them.
You can also use the Neo4jVector
to query a vector index.
Open the 2-llm-rag-python-langchain/query_vector.py
file and review the code.
The program creates a Neo4jVector
from the moviePlots
vector index and uses the similarity_search() method to retrieve the movie plots as documents.
import os
from dotenv import load_dotenv
load_dotenv()
from langchain_openai import OpenAIEmbeddings
from langchain_community.graphs import Neo4jGraph
from langchain_community.vectorstores import Neo4jVector
embedding_provider = OpenAIEmbeddings(
openai_api_key=os.getenv('OPENAI_API_KEY')
)
graph = Neo4jGraph(
url=os.getenv('NEO4J_URI'),
username=os.getenv('NEO4J_USERNAME'),
password=os.getenv('NEO4J_PASSWORD'),
)
movie_plot_vector = Neo4jVector.from_existing_index(
embedding_provider,
graph=graph,
index_name="moviePlots",
embedding_node_property="plotEmbedding",
text_node_property="plot",
)
result = movie_plot_vector.similarity_search("A movie where aliens land and attack earth.")
for doc in result:
print(doc.metadata["title"], "-", doc.page_content)
Run the code and review the results. Try different queries and see what results you get.
The similarity_search()
method returns a list of Document
objects. The Document
object includes the properties:
-
page_content
- the content referenced by the index, in this example the plot of the movie -
meta_data
- a dictionary of theMovie
node properties returned by the index
Specify the number of documents
You can pass an optional k
argument to the similarity_search()
method to specify the number of documents to return. The default is 4.
vector.similarity_search(query, k=1)
Retrievers and vector indexes allow you to incorporate unstructured data into your Langchain applications.
Continue
When you are ready, you can move on to the next task.
Lesson Summary
You learned about retrievers and how they can use Neo4j vectors to retrieve documents.
Next you will add a retriever chain to your agent.