You can incorporate data from knowledge graph into a LangChain application using a Retriever. A retriever accepts unstructured input and returns structured output.
You can learn more about retrievers in the Neo4j & LLM Fundamentals course.
Vector & Graph
Open the llm-knowledge-graph/retriever.py
code and review the program.
View retriever.py
import os
from dotenv import load_dotenv
load_dotenv()
from langchain_openai import ChatOpenAI
from langchain_openai import OpenAIEmbeddings
from langchain_community.graphs import Neo4jGraph
from langchain_community.vectorstores import Neo4jVector
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains import create_retrieval_chain
from langchain_core.prompts import ChatPromptTemplate
llm = ChatOpenAI(
openai_api_key=os.getenv('OPENAI_API_KEY'),
temperature=0
)
embedding_provider = OpenAIEmbeddings(
openai_api_key=os.getenv('OPENAI_API_KEY')
)
graph = Neo4jGraph(
url=os.getenv('NEO4J_URI'),
username=os.getenv('NEO4J_USERNAME'),
password=os.getenv('NEO4J_PASSWORD')
)
chunk_vector = Neo4jVector.from_existing_index(
embedding_provider,
graph=graph,
index_name="chunkVector",
embedding_node_property="textEmbedding",
text_node_property="text",
retrieval_query="""
// get the document
MATCH (node)-[:PART_OF]->(d:Document)
WITH node, score, d
// get the entities and relationships for the document
MATCH (node)-[:HAS_ENTITY]->(e)
MATCH p = (e)-[r]-(e2)
WHERE (node)-[:HAS_ENTITY]->(e2)
// unwind the path, create a string of the entities and relationships
UNWIND relationships(p) as rels
WITH
node,
score,
d,
collect(apoc.text.join(
[labels(startNode(rels))[0], startNode(rels).id, type(rels), labels(endNode(rels))[0], endNode(rels).id]
," ")) as kg
RETURN
node.text as text, score,
{
document: d.id,
entities: kg
} AS metadata
"""
)
instructions = (
"Use the given context to answer the question."
"Reply with an answer that includes the id of the document and other relevant information from the text."
"If you don't know the answer, say you don't know."
"Context: {context}"
)
prompt = ChatPromptTemplate.from_messages(
[
("system", instructions),
("human", "{input}"),
]
)
chunk_retriever = chunk_vector.as_retriever()
chunk_chain = create_stuff_documents_chain(llm, prompt)
chunk_retriever = create_retrieval_chain(
chunk_retriever,
chunk_chain
)
def find_chunk(q):
return chunk_retriever.invoke({"input": q})
while True:
q = input(">")
print(find_chunk(q))
The program uses a Neo4j vector index to find similar documents, and uses the knowledge graph to add additional context.
Initially, the code opens an existing Neo4j vector:
chunk_vector = Neo4jVector.from_existing_index(
embedding_provider,
graph=graph,
index_name="chunkVector",
embedding_node_property="textEmbedding",
text_node_property="text",
The retrieval_query
is used to structure the output of the retriever:
retrieval_query="""
// get the document
MATCH (node)-[:PART_OF]->(d:Document)
WITH node, score, d
// get the entities and relationships for the document
MATCH (node)-[:HAS_ENTITY]->(e)
MATCH p = (e)-[r]-(e2)
WHERE (node)-[:HAS_ENTITY]->(e2)
// unwind the path, create a string of the entities and relationships
UNWIND relationships(p) as rels
WITH
node,
score,
d,
collect(apoc.text.join(
[labels(startNode(rels))[0], startNode(rels).id, type(rels), labels(endNode(rels))[0], endNode(rels).id]
," ")) as kg
RETURN
node.text as text, score,
{
document: d.id,
entities: kg
} AS metadata
"""
The query matches the entities and relationship for the chunks and returns the data in the format nodeLabel
entityId
RELATIONSHIP_TYPE
nodeLabel
entityId
, for example Technology Neo4j IS_A Technology Graph Database
.
The retriever is created from the prompt
, chunk_chain
, and Neo4jVector
:
instructions = (
"Use the given context to answer the question."
"Reply with an answer that includes the id of the document and other relevant information from the text."
"If you don't know the answer, say you don't know."
"Context: {context}"
)
prompt = ChatPromptTemplate.from_messages(
[
("system", instructions),
("human", "{input}"),
]
)
chunk_retriever = chunk_vector.as_retriever()
chunk_chain = create_stuff_documents_chain(llm, prompt)
chunk_retriever = create_retrieval_chain(
chunk_retriever,
chunk_chain
)
Run the retriever.py
program, and enter a query relating to the data in the documents, for example, "What is a vector index?"
The program will return a list of Documents, ordered by the most relevant first, with the associated knowledge graph data as metadata.
{
'input': 'What is a vector index?',
'context': [
Document(
metadata={
'document': 'llm-fundamentals_2-vectors-semantic-search_4-improving-semantic-search.pdf',
'entities': [
'Technology Langchain UTILIZES Technology Language Models',
'Concept Vector-Based Semantic Search UTILIZES Technology Vector Index',
'Technology Vector Index HAS_PROPERTY Concept Vector Properties',
'Technology Vector Index HAS_PROPERTY Concept Vector Properties',
'Concept Vector-Based Semantic Search UTILIZES Technology Vector Index',
'Technology Langchain UTILIZES Technology Language Models'
]
},
page_content='You have learned how to create a vector index using `CREATE VECTOR INDEX`,\nset vector properties using the `db.create.setVectorProperty()` procedure,\nand query the vector index using the `db.index.vector.queryNodes()`\nprocedure.\nYou also explored the benefits and potential drawbacks of Vector-based\nSemantic Search.\nIn the next module, you will get hands-on with Langchain, a framework\ndesigned to simplify the creation of applications using large language\nmodels.'
)
...
]
}
Experiment with the retriever’s output by modifying the retrieval_query
parameter and observe the results.
Check Your Understanding
Neo4j Vector Retrievers
Which of these statements about the retriever are true? Select all that apply.
-
✓ Retrievers accept unstructured input and return structured output.
-
✓ You can control the output by supplying a
retrieval_query
. -
✓ The returned data is ordered by the most important first.
-
✓
Document
objects are returned by a retriever.
Hint
Neo4j vector retrievers are highly customizable components which structured documents when passed an unstructured question or query.
Solution
All of the statements are true:
-
Retrievers accept unstructured input and return structured output.
-
You can control the output by supplying a
retrieval_query
. -
The returned data is ordered by the most important first.
-
Document
objects are returned by a retriever.
Lesson Summary
In this lesson, you explored how a Neo4j vector retriever can retrieve data from a knowledge graph.
In the next optional challenge, you can integrate the retriever into a chatbot.