Extracting a Schema from Text

The GraphRAG for Python package (neo4j-graphrag) allows you to access Neo4j Generative AI functions.

During this course you will use the neo4j_graphrag package to build a knowledge graph and retrievers to extract information from the graph using LLMs.

==

In this lesson you will review how a graph schema can be extracted from text using an LLM.

Using the SchemaFromTextExtractor

Open genai-graphrag-python/extract_schema.py

python
extract_entities.py
from neo4j_graphrag.experimental.components.schema import SchemaFromTextExtractor
from neo4j_graphrag.llm import OpenAILLM
import asyncio

schema_extractor = SchemaFromTextExtractor(
    llm=OpenAILLM(
        model_name="gpt-4",
        model_params={"temperature": 0}
    )
)

text = """
Neo4j is a graph database management system (GDBMS) developed by Neo4j Inc.
"""

# Extract the schema from the text
extracted_schema = asyncio.run(schema_extractor.run(text=text))

print(extracted_schema)

The code uses the SchemaFromTextExtractor class to extract a schema from a given text input.

The extractor:

  1. Creates a prompt instructing the LLM to:

    1. Identify entities and relationships in any given text

    2. Format the output as JSON

  2. Passes the prompt and text to the LLM for processing

  3. Parses the JSON response to create a schema object

Given the text, "Neo4j is a graph database management system (GDBMS) developed by Neo4j Inc.", a simplified version of the extracted schema would be:

text
Extracted Schema
node_types=(
    NodeType(label='GraphDatabase),
    NodeType(label='Company')
)
relationship_types=(
    RelationshipType(label='DEVELOPED_BY'),
)
patterns=(
    ('GraphDatabaseManagementSystem', 'DEVELOPED_BY', 'Company')
)

Run the program and observe the output. You will see a more detailed schema based on the text provided.

This schema can be used to stored the data held within the text.

a graph schema with a Neo4j GraphDatabase node connected to a Neo4j Inc Company node via a DEVELOPED_BY relationship

Experiment with different text inputs to see how the schema extraction varies based on the content provided, for example:

  • "Python is a programming language created by Guido van Rossum."

  • "The Eiffel Tower is a wrought-iron lattice tower on the Champ de Mars in Paris, France."

  • "Large Language Models (LLMs) are a type of artificial intelligence model designed to understand and generate human-like text."

When you’re have experimented with the schema extraction you can continue.

Lesson Summary

In this lesson, you:

  • Learned how to extract a graph schema from unstructured text using an LLM.

  • Explore how different text inputs can lead to different schema extractions.

In the next lesson, you will create a knowledge graph construction pipeline using the SimpleKGPipeline class.

Chatbot

How can I help you today?