Retrievers

Retrievers are LangChain chain components that allow you to retrieve documents using an unstructured query.

Find a movie plot about a robot that wants to be human.

Documents are any unstructured text that you want to retrieve. A retriever often uses a vector store as its underlying data structure.

Retrievers are a key component for creating models that can take advantage of Retrieval Augmented Generation (RAG).

In the previous workshop, you used a vector index of Movie plots - in this example, the movie plots are the documents, and you can use a retriever could to give a model context.

Neo4jVector

The Neo4jVector is a LangChain vector store that uses a Neo4j database as the underlying data structure.

You used the Neo4jVector to generate embeddings, store them in the database, and retrieve them. You can also use the Neo4jVector to query a vector index.

Open the 2-llm-rag-python-langchain/query_vector.py file and review the code. The program creates a Neo4jVector from the moviePlots vector index and uses the similarity_search() method to retrieve the movie plots as documents.

python
import os
from dotenv import load_dotenv
load_dotenv()

from langchain_openai import OpenAIEmbeddings
from langchain_community.graphs import Neo4jGraph
from langchain_community.vectorstores import Neo4jVector

embedding_provider = OpenAIEmbeddings(
    openai_api_key=os.getenv('OPENAI_API_KEY')
)

graph = Neo4jGraph(
    url=os.getenv('NEO4J_URI'),
    username=os.getenv('NEO4J_USERNAME'),
    password=os.getenv('NEO4J_PASSWORD'),
)

movie_plot_vector = Neo4jVector.from_existing_index(
    embedding_provider,
    graph=graph,
    index_name="moviePlots",
    embedding_node_property="plotEmbedding",
    text_node_property="plot",
)

result = movie_plot_vector.similarity_search("A movie where aliens land and attack earth.")
for doc in result:
    print(doc.metadata["title"], "-", doc.page_content)

Run the code and review the results. Try different queries and see what results you get.

The similarity_search() method returns a list of Document objects. The Document object includes the properties:

  • page_content - the content referenced by the index, in this example the plot of the movie

  • meta_data - a dictionary of the Movie node properties returned by the index

Specify the number of documents

You can pass an optional k argument to the similarity_search() method to specify the number of documents to return. The default is 4.

python
vector.similarity_search(query, k=1)

Retrievers and vector indexes allow you to incorporate unstructured data into your Langchain applications.

Continue

When you are ready, you can move on to the next task.

Lesson Summary

You learned about retrievers and how they can use Neo4j vectors to retrieve documents.

Next you will add a retriever chain to your agent.