Integrate with a Retriever

You can incorporate data from knowledge graph into a LangChain application using a Retriever. A retriever accepts unstructured input and returns structured output.

You can learn more about retrievers in the Neo4j & LLM Fundamentals course.

Vector & Graph

Open the llm-knowledge-graph/retriever.py code and review the program.

View retriever.py
python
import os
from dotenv import load_dotenv
load_dotenv()

from langchain_openai import ChatOpenAI
from langchain_openai import OpenAIEmbeddings
from langchain_community.graphs import Neo4jGraph
from langchain_community.vectorstores import Neo4jVector
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains import create_retrieval_chain
from langchain_core.prompts import ChatPromptTemplate

llm = ChatOpenAI(
    openai_api_key=os.getenv('OPENAI_API_KEY'), 
    temperature=0
)

embedding_provider = OpenAIEmbeddings(
    openai_api_key=os.getenv('OPENAI_API_KEY')
    )

graph = Neo4jGraph(
    url=os.getenv('NEO4J_URI'),
    username=os.getenv('NEO4J_USERNAME'),
    password=os.getenv('NEO4J_PASSWORD')
)

chunk_vector = Neo4jVector.from_existing_index(
    embedding_provider,
    graph=graph,
    index_name="chunkVector",
    embedding_node_property="textEmbedding",
    text_node_property="text",
    retrieval_query="""
// get the document
MATCH (node)-[:PART_OF]->(d:Document)
WITH node, score, d

// get the entities and relationships for the document
MATCH (node)-[:HAS_ENTITY]->(e)
MATCH p = (e)-[r]-(e2)
WHERE (node)-[:HAS_ENTITY]->(e2)

// unwind the path, create a string of the entities and relationships
UNWIND relationships(p) as rels
WITH 
    node, 
    score, 
    d, 
    collect(apoc.text.join(
        [labels(startNode(rels))[0], startNode(rels).id, type(rels), labels(endNode(rels))[0], endNode(rels).id]
        ," ")) as kg
RETURN
    node.text as text, score,
    { 
        document: d.id,
        entities: kg
    } AS metadata
"""
)

instructions = (
    "Use the given context to answer the question."
    "Reply with an answer that includes the id of the document and other relevant information from the text."
    "If you don't know the answer, say you don't know."
    "Context: {context}"
)

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", instructions),
        ("human", "{input}"),
    ]
)

chunk_retriever = chunk_vector.as_retriever()
chunk_chain = create_stuff_documents_chain(llm, prompt)
chunk_retriever = create_retrieval_chain(
    chunk_retriever, 
    chunk_chain
)

def find_chunk(q):
    return chunk_retriever.invoke({"input": q})

while True:
    q = input(">")
    print(find_chunk(q))

The program uses a Neo4j vector index to find similar documents, and uses the knowledge graph to add additional context.

Initially, the code opens an existing Neo4j vector:

python
chunk_vector = Neo4jVector.from_existing_index(
    embedding_provider,
    graph=graph,
    index_name="chunkVector",
    embedding_node_property="textEmbedding",
    text_node_property="text",

The retrieval_query is used to structure the output of the retriever:

python
    retrieval_query="""
// get the document
MATCH (node)-[:PART_OF]->(d:Document)
WITH node, score, d

// get the entities and relationships for the document
MATCH (node)-[:HAS_ENTITY]->(e)
MATCH p = (e)-[r]-(e2)
WHERE (node)-[:HAS_ENTITY]->(e2)

// unwind the path, create a string of the entities and relationships
UNWIND relationships(p) as rels
WITH 
    node, 
    score, 
    d, 
    collect(apoc.text.join(
        [labels(startNode(rels))[0], startNode(rels).id, type(rels), labels(endNode(rels))[0], endNode(rels).id]
        ," ")) as kg
RETURN
    node.text as text, score,
    { 
        document: d.id,
        entities: kg
    } AS metadata
"""

The query matches the entities and relationship for the chunks and returns the data in the format nodeLabel entityId RELATIONSHIP_TYPE nodeLabel entityId, for example Technology Neo4j IS_A Technology Graph Database.

The retriever is created from the prompt, chunk_chain, and Neo4jVector:

python
instructions = (
    "Use the given context to answer the question."
    "Reply with an answer that includes the id of the document and other relevant information from the text."
    "If you don't know the answer, say you don't know."
    "Context: {context}"
)

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", instructions),
        ("human", "{input}"),
    ]
)

chunk_retriever = chunk_vector.as_retriever()
chunk_chain = create_stuff_documents_chain(llm, prompt)
chunk_retriever = create_retrieval_chain(
    chunk_retriever, 
    chunk_chain
)

Run the retriever.py program, and enter a query relating to the data in the documents, for example, "What is a vector index?"

The program will return a list of Documents, ordered by the most relevant first, with the associated knowledge graph data as metadata.

Learn how to use Generative AI, LLMs and Python to convert unstructured data into graphs.View the output
python
{
    'input': 'What is a vector index?',
    'context': [
        Document(
            metadata={
                'document': 'llm-fundamentals_2-vectors-semantic-search_4-improving-semantic-search.pdf',
                'entities': [
                    'Technology Langchain UTILIZES Technology Language Models',
                    'Concept Vector-Based Semantic Search UTILIZES Technology Vector Index',
                    'Technology Vector Index HAS_PROPERTY Concept Vector Properties',
                    'Technology Vector Index HAS_PROPERTY Concept Vector Properties',
                    'Concept Vector-Based Semantic Search UTILIZES Technology Vector Index',
                    'Technology Langchain UTILIZES Technology Language Models'
                    ]
                },
            page_content='You have learned how to create a vector index using `CREATE VECTOR INDEX`,\nset vector properties using the `db.create.setVectorProperty()` procedure,\nand query the vector index using the `db.index.vector.queryNodes()`\nprocedure.\nYou also explored the benefits and potential drawbacks of Vector-based\nSemantic Search.\nIn the next module, you will get hands-on with Langchain, a framework\ndesigned to simplify the creation of applications using large language\nmodels.'
        )
    ...
    ]
}

Experiment with the retriever’s output by modifying the retrieval_query parameter and observe the results.

Check Your Understanding

Neo4j Vector Retrievers

Which of these statements about the retriever are true? Select all that apply.

  • ✓ Retrievers accept unstructured input and return structured output.

  • ✓ You can control the output by supplying a retrieval_query.

  • ✓ The returned data is ordered by the most important first.

  • Document objects are returned by a retriever.

Hint

Neo4j vector retrievers are highly customizable components which structured documents when passed an unstructured question or query.

Solution

All of the statements are true:

  • Retrievers accept unstructured input and return structured output.

  • You can control the output by supplying a retrieval_query.

  • The returned data is ordered by the most important first.

  • Document objects are returned by a retriever.

Lesson Summary

In this lesson, you explored how a Neo4j vector retriever can retrieve data from a knowledge graph.

In the next optional challenge, you can integrate the retriever into a chatbot.