Build Graph

In the previous task, you split a piece of text into chunks and created embeddings for those chunks.

In this task you’re going to go a step further and extract nodes and relationships from text in order to build a knowledge graph.

This knowledge graph will capture the entities and relationships within the data.

It will also include the chunks and embeddings as well as their connections to the extracted entities and relationships.

You will use the neo4j GraphRAG package for Python driver and OpenAI API for this.

Getting Started

Open the 1-knowledge-graphs-vectors\build_graph.py file in your code editor.

python
import asyncio
import logging.config
import os

from dotenv import load_dotenv
from neo4j import GraphDatabase
from neo4j_graphrag.embeddings import OpenAIEmbeddings
from neo4j_graphrag.experimental.components.text_splitters.fixed_size_splitter import (
    FixedSizeSplitter,
)
from neo4j_graphrag.experimental.pipeline.kg_builder import SimpleKGPipeline
from neo4j_graphrag.llm.openai_llm import OpenAILLM

load_dotenv()

# Set log level to DEBUG for all neo4j_graphrag.* loggers
logging.config.dictConfig(
    {
        "version": 1,
        "handlers": {
            "console": {
                "class": "logging.StreamHandler",
            }
        },
        "loggers": {
            "root": {
                "handlers": ["console"],
            },
            "neo4j_graphrag": {
                "level": "DEBUG",
            },
        },
    }
)

# Connect to the Neo4j database
URI = os.getenv("NEO4J_URI")
AUTH = (os.getenv("NEO4J_USERNAME"), os.getenv("NEO4J_PASSWORD"))
driver = GraphDatabase.driver(URI, auth=AUTH)


# 1. Chunk the text


# 2. Embed the chunks


# 3. List entities and relationships to extract


# 4. Extract nodes and relationships from the chunks


# 5. Create the pipeline


# 6. Run the pipeline


driver.close()

1. Chunking the Text

The first step in the knowledge graph creation process is to split the text into chunks.

python
text_splitter = FixedSizeSplitter(chunk_size=150, chunk_overlap=20)

2. Embedding the Chunks

The next step is to create an embedding for each chunk.

We need an embeddings model in order to create embeddings from our chunks.

We can use the OpenAI text-embedding-3-large model for this.

python
embedder = OpenAIEmbeddings(model="text-embedding-3-large")

3. Listing the Entities and Relationships to Extract

To help guide the LLM we list the types of entities (nodes) and relationships we want to extract from our text.

python
entities = ["Person", "House", "Planet", "Organization"]
relations = ["SON_OF", "HEIR_OF", "RULES", "MEMBER_OF"]
potential_schema = [
    ("Person", "SON_OF", "Person"),
    ("Person", "HEIR_OF", "House"),
    ("House", "RULES", "Planet"),
    ("Person", "MEMBER_OF", "Organization"),
]

4. Extracting Nodes and Relationships from the Chunks

Now we add an LLM to extract entities (nodes) and relationships from each chunk.

python
llm = OpenAILLM(
    model_name="gpt-4o",
    model_params={
        "max_tokens": 2000,
        "response_format": {"type": "json_object"},
        "temperature": 0.0,
        "seed": 123
    },
)

5. Creating the Pipeline

Finally we create a knowledge graph creation pipeline.

The pipeline allows us to chain the various objects defined above, such as text splitters, embedders, etc. together in order to build a knowledge graph.

python
pipeline = SimpleKGPipeline(
    driver=driver,
    text_splitter=text_splitter,
    embedder=embedder,
    entities=entities,
    relations=relations,
    potential_schema=potential_schema,
    llm=llm,
    on_error="IGNORE",
    from_pdf=False,
)

6. Running the Pipeline

Finally we feed our input text to the pipeline then run it to create our knowledge graph!

python
asyncio.run(
    pipeline.run_async(
        text=(
            "The son of Duke Leto Atreides and the Lady Jessica, Paul is the heir of "
            "House Atreides, an aristocratic family that rules the planet Caladan. Lady "
            "Jessica is a Bene Gesserit and an important key in the Bene Gesserit "
            "breeding program."
        )
    )
)

7. Viewing the Graph

View your graph by running the following command.

cypher
MATCH (c:Chunk)-[]-(n) RETURN *

An example graph

8. Bonus Challenges

  1. Create a vector index on the embedding property of your Chunk nodes.

  2. Use the db.index.vector.queryNodes Cypher procedure to search this property.

  3. Create a full text index on the text property of your Chunk nodes.

  4. Use the db.index.fulltext.queryNodes Cypher procedure to search this property.

Continue

When you are ready, you can move on to the next task.

Summary

You created a graph of the course content using the neo4j GraphRAG package for Python and the OpenAI API.