Constructing Knowledge Graphs with LLMs

Unstructured data and graphs

Creating knowledge graphs from unstructured data can be complex, involving multiple steps of data query, cleanse, and transform.

You can use the text analysis capabilities of Large Language Models (LLMs) to automate the extraction of entities and relationships from your unstructured text.

An LLM generated this knowledge graph of Technologies, Concepts, and Skills from a lesson on grounding LLMS.

A knowledge graph showing the relationships between Technology Concepts and Skills

Extend your graph

In this challenge, you will use an LLM to extend your graph with new entities and relationships found in the unstructured text data.

Open the 1-knowledge-graphs-vectors\llm_build_graph.py starter code that creates the graph of lesson content.

Click to view the starter code
llm_build_graph.py
Unresolved directive in lesson.adoc - include::{repository-raw}/main/1-knowledge-graphs-vectors/llm_build_graph.py[tag=**]

You will need to:

  1. Create an LLM instance

  2. Create a transformer to extract entities and relationships

  3. Extract entities and relationships from the text

  4. Map the entities to the paragraphs

  5. Add the graph documents to the database

Create an LLM

You need an LLM instance to extract the entities and relationships:

python
Create the llm
Unresolved directive in lesson.adoc - include::{repository-raw}/main/1-knowledge-graphs-vectors/solutions/llm_build_graph.py[tag=llm]

The model_name parameter defines which OpenAI model will be used. gpt-3.5-turbo is a good choice for this task given its accuracy, speed, and cost.

Graph Transformer

To extract the entities and relationships, you will use a graph transformer. The graph transformer takes unstructured text data, passes it to the LLM, and returns the entities and relationships.

python
Create the transformer
Unresolved directive in lesson.adoc - include::{repository-raw}/main/1-knowledge-graphs-vectors/solutions/llm_build_graph.py[tag=doc_transformer]

The optional allowed_nodes and allowed_relationships parameters allow you to defined the types of nodes and relationships you want to extract from the text.

In this example, the nodes are restricted to entities relevant to the content. The relationships are not restricted, allowing the LLM to find any relationships between the entities.

Restricting nodes and relationships

Restricting the nodes and relationship will result in a more concise knowledge graph. A more concise graph may support you in answering specific questions but it could also be missing information.

Extract entities and relationships

For each chunk of text, you will use the transformer to convert the text into a graph. The transformer returns a set of graph documents that represent the entities and relationships in the text.

python
Call the transformer
Unresolved directive in lesson.adoc - include::{repository-raw}/main/1-knowledge-graphs-vectors/solutions/llm_build_graph.py[tag=llm_graph_docs]

Map extracted entities to the paragraphs

The graph documents contain the extracted nodes and relationships, but they are not linked to the original paragraphs.

To understand which entities are related to which paragraphs, you will map the extracted nodes to the paragraphs.

You will create a data model with a HAS_ENTITY relationship between the paragraphs and the entities.

A data model showing a HAS_ENTITY relationship between the Paragraph and entity nodes

Map extracted entities to the paragraphs

This code inserts the Paragraph node into the graph document, and creates a HAS_ENTITY relationship between the paragraph and the extracted entities.

python
Map the entities to the paragraphs
Unresolved directive in lesson.adoc - include::{repository-raw}/main/1-knowledge-graphs-vectors/solutions/llm_build_graph.py[tag=map_entities]

Add the graph documents

Finally, you need to add the new graph documents to the Neo4j graph database.

python
Add the graph documents
Unresolved directive in lesson.adoc - include::{repository-raw}/main/1-knowledge-graphs-vectors/solutions/llm_build_graph.py[tag=llm_add_graph]

When you are ready, run the program to extend your graph.

Processing time

Calls to the LLM are relatively slow, so the program will take a few minutes to run.

Querying the knowledge graph

You can view the generated entities using the following Cypher query:

cypher
MATCH (p:Paragraph)-[:HAS_ENTITY]-(e)
RETURN p, e

Entities

The entities in the graph allow you to understand what the context in the text.

You can find the most mentioned topics in the graph by counting the number of times a node label (or entity) appears in the graph:

cypher
MATCH ()-[:HAS_ENTITY]->(e)
RETURN labels(e) as labels, count(e) as nodes
ORDER BY nodes DESC

Entities

You can drill down into the entity id to gain insights into the content. For example, you can find the most mentioned Technology.

cypher
MATCH ()-[r:HAS_ENTITY]->(e:Technology)
RETURN e.id AS entityId, count(r) AS mentions
ORDER BY mentions DESC

The knowledge graph can also show you the connections within the content. For example, what lessons relate to each other.

This Cypher query matches one specific document and uses the entities to find related documents:

cypher
MATCH (l:Lesson {
    name: "1-neo4j-and-genai"
})-[:CONTAINS]->(p:Paragraph)

MATCH (p)-[:HAS_ENTITY]->(entity)<-[:HAS_ENTITY]-(otherParagraph)
MATCH (otherParagraph)<-[:CONTAINS]->(otherLesson)
RETURN DISTINCT entity.id, otherLesson.name

Lesson entities

The knowledge graph contains the relationships between entities in all the documents.

This Cypher query restricts the output to a specific chunk or document:

cypher
MATCH (l:Lesson {
    name: "1-neo4j-and-genai"
})-[:CONTAINS]->(p:Paragraph)
MATCH (p)-[:HAS_ENTITY]->(e)

MATCH path = (e)-[r]-(e2)
WHERE (p)-[:HAS_ENTITY]->(e2)
RETURN path

A path is returned representing the knowledge graph for the document.

The graph output from the previous Cypher query

Labels, ids, and relationships

You can gain the nodes labels, ids, relationship types by unwinding the path’s relationships:

cypher
MATCH (l:Lesson {
    name: "1-neo4j-and-genai"
})-[:CONTAINS]->(p:Paragraph)
MATCH (p)-[:HAS_ENTITY]->(e)

MATCH path = (e)-[r]-(e2)
WHERE (p)-[:HAS_ENTITY]->(e2)

UNWIND relationships(path) as rels
RETURN
    labels(startNode(rels))[0] as eLabel,
    startNode(rels).id as eId,
    type(rels) as relType,
    labels(endNode(rels))[0] as e2Label,
    endNode(rels).id as e2Id

Explore the graph

Take some time to explore the knowledge graph to find relationships between entities and lessons.

Summary

You used an LLM to create a knowledge graph from unstructured text.

Chatbot

How can I help you today?