Building a Multimodal Application with VectorCypherRetriever

Recap and Introduction

Previously, you’ve explored various retrieval strategies using Neo4j, including VectorRetriever, VectorCypherRetriever, HybridRetriever, HybridCypherRetriever, and Text2CypherRetriever to fetch semantically relevant data. Now, we’ll enhance your GraphRAG applications with a multimodal approach using the VectorCypherRetriever. This retriever integrates both textual and visual data, enabling more powerful and accurate queries.

The VectorCypherRetriever allows for multimodal integration, where both text and visual data are used to perform advanced searches within your Neo4j recommendations database. Follow these steps to set up the components.

Try this Cypher command in sandbox:

cypher
MATCH (m:Movie {title: "Homeward Bound: The Incredible Journey"})
MATCH (actor:Actor)-[:ACTED_IN]->(m)
RETURN m, collect(actor) AS actors;

Open the 2-neo4j-graphrag\multimodal_app.py file in your code editor.

1. Initialize the Embedder

Create an image embedder using the "clip-ViT-B-32" model to extract visual features from movie posters:

python
IMAGE_EMBEDDING_MODEL = "clip-ViT-B-32"
embedder = SentenceTransformerEmbeddings(IMAGE_EMBEDDING_MODEL)

2. Initialize the VectorCypherRetriever

Set up the VectorCypherRetriever to integrate both visual and text data for more comprehensive retrieval:

python
POSTER_INDEX_NAME = "moviePosters"
retrieval_query = "RETURN node.title as title, node.plot as plot, node.poster as posterUrl, score"

def format_record_function(record: neo4j.Record) -> RetrieverResultItem:
    return RetrieverResultItem(
        content=f"Movie title: {record.get('title')}, movie plot: {record.get('plot')}",
        metadata={
            "title": record.get("title"),
            "plot": record.get("plot"),
            "poster": record.get("posterUrl"),
            "score": record.get("score"),
        },
    )

retriever = VectorCypherRetriever(
    driver,
    index_name=POSTER_INDEX_NAME,
    retrieval_query=retrieval_query,
    result_formatter=format_record_function,
    embedder=embedder,
)

query_text = ("Find a movie where in the poster there are only animals without people")
top_k = 3

result = retriever.search(query_text=query_text, top_k=top_k)

for r in result.items:
    print(r.content, r.metadata.get("score"))
    print(r.metadata["poster"])
  • driver: Neo4j database driver.

  • index_name: The vector index for the movie poster embeddings.

  • retrieval_query: Cypher query to retrieve nodes along with their properties.

  • result_formatter: Function to format the returned nodes.

  • embedder: Embedder for generating visual embeddings.

Tips for Effective Use

  • Image Embeddings:

    • Generated using the "clip-ViT-B-32" model from Sentence Transformers.

    • Capture visual features of movie posters for semantic similarity.

  • Vector Index:

    • Utilize the moviePosters vector index.

    • Stores embeddings of movie posters, enabling efficient image-based searches.

  • Multimodal Integration:

    • Combines text-based plot descriptions with image-based poster analysis.

    • Allows retrieval based on both semantic content and visual representation.

Continue

When you are ready, you can move on to the next task.

Summary

You’ve learned how to use VectorCypherRetriever in a multimodal context, integrating text and image embeddings to enhance your GraphRAG applications. This method offers a powerful way to perform sophisticated, multimodal searches that leverage both textual and visual information within your Neo4j knowledge graph.