Vector Search

In this lesson, you will learn how to use vectors indexes with LangChain to perform vector search.

Movie Plots

Each Movie node in the database has a .plot property.

cypher
Movie Plot Example
MATCH (m:Movie {title: "Toy Story"})
RETURN m.title AS title, m.plot AS plot
"A cowboy doll is profoundly threatened and jealous when a new spaceman figure supplants him as top toy in a boy's room."

Embeddings have been created for 1000 movie plots. The embedding is stored in the .plotEmbedding property of the Movie nodes.

cypher
View the plot embedding
MATCH (m:Movie {title: "Toy Story"})
RETURN m.title AS title, m.plot AS plot, m.plotEmbedding

A vector index, moviePlots, has been created for the .plotEmbedding property of the Movie nodes.

You can use the moviePlots vector index to find the most similar movies by comparing embeddings of movie plots.

You can learn more about creating embeddings and vector indexes in the GraphAcademy Introduction to Vector Indexes and Unstructured Data course.

The Neo4jVector class provides an interface to use vector indexes in Neo4j. You can use Neo4jVector to create a vector store that can modify data and perform similarity search.

Open the genai-integration-langchain/vector_search.py file.

python
vector_search.py
import os
from dotenv import load_dotenv
load_dotenv()

from langchain_neo4j import Neo4jGraph

# Connect to Neo4j
graph = Neo4jGraph(
    url=os.getenv("NEO4J_URI"),
    username=os.getenv("NEO4J_USERNAME"), 
    password=os.getenv("NEO4J_PASSWORD"),
)

# Create the embedding model

# Create Vector

# Search for similar movie plots

# Parse the documents

To perform a similarity search, you need to:

  1. Connect to a Neo4j database

  2. Create an embedding model to convert a query into a vector

  3. Create a Neo4jVector instance and connect to the database

  4. Use the similarity_search method to find similar nodes based on the query

Embedding model

The movie plot embeddings were created using the OpenAI text-embedding-ada-002 model. You need to use the same model to convert the query into vectors.

Use the OpenAIEmbeddings class to create the embedding model:

python
from langchain_openai import OpenAIEmbeddings

# Create the embedding model
embedding_model = OpenAIEmbeddings(model="text-embedding-ada-002")

Vector Store

Use the Neo4jVector class to create a vector store that connects to the Neo4j database, uses the embedding model, and the moviePlots index.

python
from langchain_neo4j import Neo4jVector

# Create Vector
plot_vector = Neo4jVector.from_existing_index(
    embedding_model,
    graph=graph,
    index_name="moviePlots",
    embedding_node_property="plotEmbedding",
    text_node_property="plot",
)

When specifying the vector index you must also state the properties that contain the text (text_node_property) and the embedding (embedding_node_property).

The similarity_search method of the Neo4jVector class allows you to perform a similarity search based on a query.

python
# Search for similar movie plots
plot = "Toys come alive"
result = plot_vector.similarity_search(plot, k=3)
print(result)

The query is converted into a vector using the embedding model, and then the vector index is used to find the most similar nodes.

The k parameter specifies the number of similar nodes to return.

Click to see the complete code
python
import os
from dotenv import load_dotenv
load_dotenv()

from langchain_neo4j import Neo4jGraph
from langchain_neo4j import Neo4jVector
from langchain_openai import OpenAIEmbeddings

# Connect to Neo4j
graph = Neo4jGraph(
    url=os.getenv("NEO4J_URI"),
    username=os.getenv("NEO4J_USERNAME"), 
    password=os.getenv("NEO4J_PASSWORD"),
)

# Create the embedding model
embedding_model = OpenAIEmbeddings(model="text-embedding-ada-002")

# Create Vector
plot_vector = Neo4jVector.from_existing_index(
    embedding_model,
    graph=graph,
    index_name="moviePlots",
    embedding_node_property="plotEmbedding",
    text_node_property="plot",
)

# Search for similar movie plots
plot = "Toys come alive"
result = plot_vector.similarity_search(plot, k=3)
print(result)

Running the code will return the most similar movies to the query.

The method returns a list of LangChain Document objects, each containing the plot as the content and the node properties as metadata.

You can parse the results to extract the movie titles and plots.

python
# Parse the documents
for doc in result:
    print(f"Title: {doc.metadata['title']}")
    print(f"Plot: {doc.page_content}\n")

Experiment with different plots, such as:

  • Toys come alive

  • Love conquers all

  • Aliens invade Earth

  • A detective solves a mystery

Filtering results

You can filter the results of the similarity_search method by using the filter parameter.

The filter parameter allows you to specify a condition to filter the results, for example, only return movies with a revenue gretaer than 200 million:

python
result = plot_vector.similarity_search(
    plot,
    k=3,
    filter={"revenue": {"$gte": 200000000}}
)

You can learn more about Neo4jVector metadata filtering in the LangChain documentation.

Check your understanding

Why is an Embedding Model Required?

Why does the Neo4jVector class require you to provide an embedding model?

  • ❏ To connect to the Neo4j database

  • ✓ To convert text into a vector representation

  • ❏ To create the vector index in the database

  • ❏ To extract metadata from the movie nodes

Hint

What must happen to a query before it can be compared to the stored vectors?

Solution

The answer is To convert text into a vector representation.

The Neo4jVector class requires an embedding model to convert the input text (such as a search query) into a vector. The query vector can then be compared to the vectors stored in the database to find similar items.

Lesson Summary

In this lesson, you learned how to use the Neo4jVector class to perform vector search.

In the next lesson, you will add a RAG vector retriever to the agent to retrieve relevant movie plots based on user queries.

Chatbot

How can I help you today?