Lexical graph configuration

The SimpleKGBuilder uses a default unstructured graph data model to represent documents, text chunks, and entities.

Graph data model showing Document, Chunk, and Entity nodes with relationships between them

You can modify this data model to suit your own use case by creating a LexicalGraphConfig.

Create Lexical Graph Configuration

The documents you have been using contain lessons which can be sub-divided into sections.

An alternative graph data model could represent each document as a Lesson node, with Section nodes representing each chunks within the text.

A graph data model showing Document nodes as Lesson nodes and Chunk nodes as Section nodes, with relationships between them

To create this graph data model you can define a custom LexicalGraphConfig:

python
from neo4j_graphrag.experimental.components.types import LexicalGraphConfig

config = LexicalGraphConfig(
    chunk_node_label="Section",
    document_node_label="Lesson",
    chunk_to_document_relationship_type="IN_LESSON",
    next_chunk_relationship_type="NEXT_SECTION",
    node_to_chunk_relationship_type="IN_SECTION",
    chunk_embedding_property="embeddings",
)

The config object defines the mapping between the documents, chunks, and entities nodes.

You can then use the custom configuration in the SimpleKGPipeline by setting the lexical_graph_config parameter:

python
kg_builder = SimpleKGPipeline(
    llm=llm,
    driver=neo4j_driver, 
    neo4j_database=os.getenv("NEO4J_DATABASE"), 
    embedder=embedder, 
    from_pdf=True,
    lexical_graph_config=config,
)
Reveal the complete code

This example code shows how to create and use the LexicalGraphConfig in a SimpleKGPipeline:

python
import os
from dotenv import load_dotenv
load_dotenv()

import asyncio

from neo4j import GraphDatabase
from neo4j_graphrag.llm import OpenAILLM
from neo4j_graphrag.embeddings import OpenAIEmbeddings
from neo4j_graphrag.experimental.pipeline.kg_builder import SimpleKGPipeline

from neo4j_graphrag.experimental.components.types import LexicalGraphConfig

neo4j_driver = GraphDatabase.driver(
    os.getenv("NEO4J_URI"),
    auth=(os.getenv("NEO4J_USERNAME"), os.getenv("NEO4J_PASSWORD"))
)
neo4j_driver.verify_connectivity()

llm = OpenAILLM(
    model_name="gpt-4o",
    model_params={
        "temperature": 0,
        "response_format": {"type": "json_object"},
    }
)

embedder = OpenAIEmbeddings(
    model="text-embedding-ada-002"
)

config = LexicalGraphConfig(
    chunk_node_label="Section",
    document_node_label="Lesson",
    chunk_to_document_relationship_type="IN_LESSON",
    next_chunk_relationship_type="NEXT_SECTION",
    node_to_chunk_relationship_type="IN_SECTION",
    chunk_embedding_property="embeddings",
)

kg_builder = SimpleKGPipeline(
    llm=llm,
    driver=neo4j_driver, 
    neo4j_database=os.getenv("NEO4J_DATABASE"), 
    embedder=embedder, 
    from_pdf=True,
    lexical_graph_config=config,
)

pdf_file = "./genai-graphrag-python/data/genai-fundamentals_1-generative-ai_1-what-is-genai.pdf"
result = asyncio.run(kg_builder.run_async(file_path=pdf_file))
print(result.result)

When you’re ready you can continue.

Lesson Summary

In this lesson, you learned how to create a graph configuration to changes the structure of the lexical graph.

In the next lesson, you will learn about different models and approaches for resolving entities.

Chatbot

How can I help you today?