The SimpleKGBuilder uses a default unstructured graph data model to represent documents, text chunks, and entities.
You can modify this data model to suit your own use case by creating a LexicalGraphConfig.
Create Lexical Graph Configuration
The documents you have been using contain lessons which can be sub-divided into sections.
An alternative graph data model could represent each document as a Lesson node, with Section nodes representing each chunks within the text.
To create this graph data model you can define a custom LexicalGraphConfig:
from neo4j_graphrag.experimental.components.types import LexicalGraphConfig
config = LexicalGraphConfig(
chunk_node_label="Section",
document_node_label="Lesson",
chunk_to_document_relationship_type="IN_LESSON",
next_chunk_relationship_type="NEXT_SECTION",
node_to_chunk_relationship_type="IN_SECTION",
chunk_embedding_property="embeddings",
)The config object defines the mapping between the documents, chunks, and entities nodes.
You can then use the custom configuration in the SimpleKGPipeline by setting the lexical_graph_config parameter:
kg_builder = SimpleKGPipeline(
llm=llm,
driver=neo4j_driver,
neo4j_database=os.getenv("NEO4J_DATABASE"),
embedder=embedder,
from_pdf=True,
lexical_graph_config=config,
)Reveal the complete code
This example code shows how to create and use the LexicalGraphConfig in a SimpleKGPipeline:
import os
from dotenv import load_dotenv
load_dotenv()
import asyncio
from neo4j import GraphDatabase
from neo4j_graphrag.llm import OpenAILLM
from neo4j_graphrag.embeddings import OpenAIEmbeddings
from neo4j_graphrag.experimental.pipeline.kg_builder import SimpleKGPipeline
from neo4j_graphrag.experimental.components.types import LexicalGraphConfig
neo4j_driver = GraphDatabase.driver(
os.getenv("NEO4J_URI"),
auth=(os.getenv("NEO4J_USERNAME"), os.getenv("NEO4J_PASSWORD"))
)
neo4j_driver.verify_connectivity()
llm = OpenAILLM(
model_name="gpt-4o",
model_params={
"temperature": 0,
"response_format": {"type": "json_object"},
}
)
embedder = OpenAIEmbeddings(
model="text-embedding-ada-002"
)
config = LexicalGraphConfig(
chunk_node_label="Section",
document_node_label="Lesson",
chunk_to_document_relationship_type="IN_LESSON",
next_chunk_relationship_type="NEXT_SECTION",
node_to_chunk_relationship_type="IN_SECTION",
chunk_embedding_property="embeddings",
)
kg_builder = SimpleKGPipeline(
llm=llm,
driver=neo4j_driver,
neo4j_database=os.getenv("NEO4J_DATABASE"),
embedder=embedder,
from_pdf=True,
lexical_graph_config=config,
)
pdf_file = "./genai-graphrag-python/data/genai-fundamentals_1-generative-ai_1-what-is-genai.pdf"
result = asyncio.run(kg_builder.run_async(file_path=pdf_file))
print(result.result)When you’re ready you can continue.
Lesson Summary
In this lesson, you learned how to create a graph configuration to changes the structure of the lexical graph.
In the next lesson, you will learn about different models and approaches for resolving entities.