In the previous task, you split a piece of text into chunks and created embeddings for those chunks.
In this task you’re going to go a step further and extract nodes and relationships from text in order to build a knowledge graph.
This knowledge graph will capture the entities and relationships within the data.
It will also include the chunks and embeddings as well as their connections to the extracted entities and relationships.
You will use the neo4j
GraphRAG package for Python driver and OpenAI API for this.
Getting Started
Open the 1-knowledge-graphs-vectors\build_graph.py
file in your code editor.
import asyncio
import logging.config
import os
from dotenv import load_dotenv
from neo4j import GraphDatabase
from neo4j_graphrag.embeddings import OpenAIEmbeddings
from neo4j_graphrag.experimental.components.text_splitters.fixed_size_splitter import (
FixedSizeSplitter,
)
from neo4j_graphrag.experimental.pipeline.kg_builder import SimpleKGPipeline
from neo4j_graphrag.llm.openai_llm import OpenAILLM
load_dotenv()
# Set log level to DEBUG for all neo4j_graphrag.* loggers
logging.config.dictConfig(
{
"version": 1,
"handlers": {
"console": {
"class": "logging.StreamHandler",
}
},
"loggers": {
"root": {
"handlers": ["console"],
},
"neo4j_graphrag": {
"level": "DEBUG",
},
},
}
)
# Connect to the Neo4j database
URI = os.getenv("NEO4J_URI")
AUTH = (os.getenv("NEO4J_USERNAME"), os.getenv("NEO4J_PASSWORD"))
driver = GraphDatabase.driver(URI, auth=AUTH)
# 1. Chunk the text
# 2. Embed the chunks
# 3. List entities and relationships to extract
# 4. Extract nodes and relationships from the chunks
# 5. Create the pipeline
# 6. Run the pipeline
driver.close()
1. Chunking the Text
The first step in the knowledge graph creation process is to split the text into chunks.
text_splitter = FixedSizeSplitter(chunk_size=150, chunk_overlap=20)
2. Embedding the Chunks
The next step is to create an embedding for each chunk.
We need an embeddings model in order to create embeddings from our chunks.
We can use the OpenAI text-embedding-3-large
model for this.
embedder = OpenAIEmbeddings(model="text-embedding-3-large")
3. Listing the Entities and Relationships to Extract
To help guide the LLM we list the types of entities (nodes) and relationships we want to extract from our text.
entities = ["Person", "House", "Planet", "Organization"]
relations = ["SON_OF", "HEIR_OF", "RULES", "MEMBER_OF"]
potential_schema = [
("Person", "SON_OF", "Person"),
("Person", "HEIR_OF", "House"),
("House", "RULES", "Planet"),
("Person", "MEMBER_OF", "Organization"),
]
4. Extracting Nodes and Relationships from the Chunks
Now we add an LLM to extract entities (nodes) and relationships from each chunk.
llm = OpenAILLM(
model_name="gpt-4o",
model_params={
"max_tokens": 2000,
"response_format": {"type": "json_object"},
"temperature": 0.0,
"seed": 123
},
)
5. Creating the Pipeline
Finally we create a knowledge graph creation pipeline.
The pipeline allows us to chain the various objects defined above, such as text splitters, embedders, etc. together in order to build a knowledge graph.
pipeline = SimpleKGPipeline(
driver=driver,
text_splitter=text_splitter,
embedder=embedder,
entities=entities,
relations=relations,
potential_schema=potential_schema,
llm=llm,
on_error="IGNORE",
from_pdf=False,
)
6. Running the Pipeline
Finally we feed our input text to the pipeline then run it to create our knowledge graph!
asyncio.run(
pipeline.run_async(
text=(
"The son of Duke Leto Atreides and the Lady Jessica, Paul is the heir of "
"House Atreides, an aristocratic family that rules the planet Caladan. Lady "
"Jessica is a Bene Gesserit and an important key in the Bene Gesserit "
"breeding program."
)
)
)
7. Viewing the Graph
View your graph by running the following command.
MATCH (c:Chunk)-[]-(n) RETURN *
8. Bonus Challenges
-
Create a vector index on the
embedding
property of yourChunk
nodes. -
Use the
db.index.vector.queryNodes
Cypher procedure to search this property. -
Create a full text index on the
text
property of yourChunk
nodes. -
Use the
db.index.fulltext.queryNodes
Cypher procedure to search this property.
Continue
When you are ready, you can move on to the next task.
Summary
You created a graph of the course content using the neo4j
GraphRAG package for Python and the OpenAI API.