Create a graph

Overview

In this lesson, you will learn how to create a knowledge graph from unstructured data using the SimpleKGPipeline class.

The SimpleKGPipeline class provides a pipeline which implements a series of steps to create a knowledge graph from unstructured data:

Load the text
Split the text into chunks
Create embeddings for each chunk
Extract entities from the chunks
Write the data to a Neo4j database

Pipeline steps

Text to knowledge graph

You are going to create a knowledge graph from the text within the Neo4j & Generative AI Fundamentals course.

diagram showing the SimpleKHPipeline process from document to graph

Create the knowledge graph

Your task is to:

Open the workshop-genai/kg_builder.py file
Review the code and how to create the SimpleKGPipeline
Run the pipeline to create the graph from the PDF document
Review the graph and the entities extracted from the text

Continue with the lesson to review and run the SimpleKGPipeline.

SimpleKGPipeline

Open workshop-genai/kg_builder.py and review the code.

python

kg_builder.py

import os
from dotenv import load_dotenv
load_dotenv()

import asyncio

from neo4j import GraphDatabase
from neo4j_graphrag.llm import OpenAILLM
from neo4j_graphrag.embeddings import OpenAIEmbeddings
from neo4j_graphrag.experimental.pipeline.kg_builder import SimpleKGPipeline

neo4j_driver = GraphDatabase.driver(
    os.getenv("NEO4J_URI"),
    auth=(os.getenv("NEO4J_USERNAME"), os.getenv("NEO4J_PASSWORD"))
)
neo4j_driver.verify_connectivity()

llm = OpenAILLM(
    model_name="gpt-4o",
    model_params={
        "temperature": 0,
        "response_format": {"type": "json_object"},
    }
)

embedder = OpenAIEmbeddings(
    model="text-embedding-ada-002"
)

kg_builder = SimpleKGPipeline(
    llm=llm,
    driver=neo4j_driver, 
    neo4j_database=os.getenv("NEO4J_DATABASE"), 
    embedder=embedder, 
    from_pdf=True,
)

pdf_file = "./workshop-genai/data/genai-fundamentals_1-generative-ai_1-what-is-genai.pdf"
result = asyncio.run(kg_builder.run_async(file_path=pdf_file))
print(result.result)

The code loads a single pdf file, data/genai-fundamentals_1-generative-ai_1-what-is-genai.pdf, and run the pipeline to create a knowledge graph in Neo4j.

The PDF document contains the content from the What is Generative AI? lesson.

Breaking down the code, you can see the following steps:

Create a connection to Neo4j:

python

Neo4j connection

neo4j_driver = GraphDatabase.driver(
    os.getenv("NEO4J_URI"),
    auth=(os.getenv("NEO4J_USERNAME"), os.getenv("NEO4J_PASSWORD"))
)
neo4j_driver.verify_connectivity()

Instantiate an LLM model:
python
LLM
```
llm = OpenAILLM(
    model_name="gpt-4o",
    model_params={
        "temperature": 0,
        "response_format": {"type": "json_object"},
    }
)
```
Model parameters, model_params, are set to lower the temperature of the model to be more deterministic, and set to response format to be json.

Create an embedding model:

python

Embedding model

embedder = OpenAIEmbeddings(
    model="text-embedding-ada-002"
)

Setup the SimpleKGPipeline:

python

kg_builder

kg_builder = SimpleKGPipeline(
    llm=llm,
    driver=neo4j_driver, 
    neo4j_database=os.getenv("NEO4J_DATABASE"), 
    embedder=embedder, 
    from_pdf=True,
)

Run the pipeline to create the graph from a single PDF file:

python

kg_builder

pdf_file = "./workshop-genai/data/genai-fundamentals_1-generative-ai_1-what-is-genai.pdf"
result = asyncio.run(kg_builder.run_async(file_path=pdf_file))
print(result.result)

When you run the program, the pipeline will process the PDF document and create the graph in Neo4j.

A summary of the results will be returned, for example:

{'resolver': {'number_of_nodes_to_resolve': 12, 'number_of_created_nodes': 10}}

Explore the Knowledge Graph

The SimpleKGPipeline creates the following default graph model.

The Entity nodes represent the entities extracted from the text chunks. Relevant properties are extracted from the chunk and associated with the entity nodes.

a graph model showing (Document)<[:FROM_DOCUMENT]-(Chunk)←[:FROM_CHUNK]-(Entity)

View documents and chunks

You can view the documents and chunks created in the graph using the following Cypher query:

cypher

View the documents and chunks

MATCH (d:Document)<-[:FROM_DOCUMENT]-(c:Chunk)
RETURN d.path, c.text

Chunk size

The default chunk size is greater than the length of the document, so only a single chunk is created.

Entities and relationships

The extracted entities and the relationships between them can be found using a variable length path query:

cypher

View the entities extracted from each chunk

MATCH p = (c:Chunk)-[*..3]-(e:__Entity__)
RETURN p

A graph showing entities extracted from a chunk

Lesson Summary

In this lesson, you:

Learned how to use the SimpleKGPipeline class.
Explored the graph model created by the pipeline.

In the next lesson, you will modify the chunk size used when splitting the text and define a custom schema for the knowledge graph.

Neo4j and Generative AI Workshop

Generative AI

Knowledge Graph Construction

Retrieval

Agents

Create a graph

Overview

Pipeline steps

Text to knowledge graph

Create the knowledge graph

SimpleKGPipeline

Explore the Knowledge Graph

View documents and chunks

Entities and relationships

Lesson Summary

Chatbot