Constructing Knowledge Graphs with LLMs

Unstructured data and graphs

Creating knowledge graphs from unstructured data can be complex, involving multiple steps of data query, cleanse, and transform.

You can use the text analysis capabilities of Large Language Models (LLMs) to automate the extraction of entities and relationships from your unstructured text.

An LLM generated this knowledge graph of Technologies, Concepts, and Skills from a lesson on grounding LLMS.

A knowledge graph showing the relationships between Technology Concepts and Skills

Extend your graph

In this challenge, you will use an LLM to extend your graph with new entities and relationships found in the unstructured text data.

Open the 1-knowledge-graphs-vectors\llm_build_graph.py starter code that creates the graph of lesson content.

Click to view the starter code

llm_build_graph.py

Unresolved directive in lesson.adoc - include::{repository-raw}/main/1-knowledge-graphs-vectors/llm_build_graph.py[tag=**]

You will need to:

Create an LLM instance
Create a transformer to extract entities and relationships
Extract entities and relationships from the text
Map the entities to the paragraphs
Add the graph documents to the database

Create an LLM

You need an LLM instance to extract the entities and relationships:

python

Create the llm

Unresolved directive in lesson.adoc - include::{repository-raw}/main/1-knowledge-graphs-vectors/solutions/llm_build_graph.py[tag=llm]

The model_name parameter defines which OpenAI model will be used. gpt-3.5-turbo is a good choice for this task given its accuracy, speed, and cost.

Graph Transformer

To extract the entities and relationships, you will use a graph transformer. The graph transformer takes unstructured text data, passes it to the LLM, and returns the entities and relationships.

python

Create the transformer

Unresolved directive in lesson.adoc - include::{repository-raw}/main/1-knowledge-graphs-vectors/solutions/llm_build_graph.py[tag=doc_transformer]

The optional allowed_nodes and allowed_relationships parameters allow you to defined the types of nodes and relationships you want to extract from the text.

In this example, the nodes are restricted to entities relevant to the content. The relationships are not restricted, allowing the LLM to find any relationships between the entities.

Restricting nodes and relationships

Restricting the nodes and relationship will result in a more concise knowledge graph. A more concise graph may support you in answering specific questions but it could also be missing information.

Extract entities and relationships

For each chunk of text, you will use the transformer to convert the text into a graph. The transformer returns a set of graph documents that represent the entities and relationships in the text.

python

Call the transformer

Unresolved directive in lesson.adoc - include::{repository-raw}/main/1-knowledge-graphs-vectors/solutions/llm_build_graph.py[tag=llm_graph_docs]

Map extracted entities to the paragraphs

The graph documents contain the extracted nodes and relationships, but they are not linked to the original paragraphs.

To understand which entities are related to which paragraphs, you will map the extracted nodes to the paragraphs.

You will create a data model with a HAS_ENTITY relationship between the paragraphs and the entities.

A data model showing a HAS_ENTITY relationship between the Paragraph and entity nodes

Map extracted entities to the paragraphs

This code inserts the Paragraph node into the graph document, and creates a HAS_ENTITY relationship between the paragraph and the extracted entities.

python

Map the entities to the paragraphs

Unresolved directive in lesson.adoc - include::{repository-raw}/main/1-knowledge-graphs-vectors/solutions/llm_build_graph.py[tag=map_entities]

Add the graph documents

Finally, you need to add the new graph documents to the Neo4j graph database.

python

Add the graph documents

Unresolved directive in lesson.adoc - include::{repository-raw}/main/1-knowledge-graphs-vectors/solutions/llm_build_graph.py[tag=llm_add_graph]

When you are ready, run the program to extend your graph.

Processing time

Calls to the LLM are relatively slow, so the program will take a few minutes to run.

Querying the knowledge graph

You can view the generated entities using the following Cypher query:

cypher

MATCH (p:Paragraph)-[:HAS_ENTITY]-(e)
RETURN p, e

Entities

The entities in the graph allow you to understand what the context in the text.

You can find the most mentioned topics in the graph by counting the number of times a node label (or entity) appears in the graph:

cypher

MATCH ()-[:HAS_ENTITY]->(e)
RETURN labels(e) as labels, count(e) as nodes
ORDER BY nodes DESC

Entities

You can drill down into the entity id to gain insights into the content. For example, you can find the most mentioned Technology.

cypher

MATCH ()-[r:HAS_ENTITY]->(e:Technology)
RETURN e.id AS entityId, count(r) AS mentions
ORDER BY mentions DESC

Related lessons

The knowledge graph can also show you the connections within the content. For example, what lessons relate to each other.

This Cypher query matches one specific document and uses the entities to find related documents:

cypher

MATCH (l:Lesson {
    name: "1-neo4j-and-genai"
})-[:CONTAINS]->(p:Paragraph)

MATCH (p)-[:HAS_ENTITY]->(entity)<-[:HAS_ENTITY]-(otherParagraph)
MATCH (otherParagraph)<-[:CONTAINS]->(otherLesson)
RETURN DISTINCT entity.id, otherLesson.name

Lesson entities

The knowledge graph contains the relationships between entities in all the documents.

This Cypher query restricts the output to a specific chunk or document:

cypher

MATCH (l:Lesson {
    name: "1-neo4j-and-genai"
})-[:CONTAINS]->(p:Paragraph)
MATCH (p)-[:HAS_ENTITY]->(e)

MATCH path = (e)-[r]-(e2)
WHERE (p)-[:HAS_ENTITY]->(e2)
RETURN path

A path is returned representing the knowledge graph for the document.

The graph output from the previous Cypher query

Labels, ids, and relationships

You can gain the nodes labels, ids, relationship types by unwinding the path’s relationships:

cypher

MATCH (l:Lesson {
    name: "1-neo4j-and-genai"
})-[:CONTAINS]->(p:Paragraph)
MATCH (p)-[:HAS_ENTITY]->(e)

MATCH path = (e)-[r]-(e2)
WHERE (p)-[:HAS_ENTITY]->(e2)

UNWIND relationships(path) as rels
RETURN
    labels(startNode(rels))[0] as eLabel,
    startNode(rels).id as eId,
    type(rels) as relType,
    labels(endNode(rels))[0] as e2Label,
    endNode(rels).id as e2Id

Explore the graph

Take some time to explore the knowledge graph to find relationships between entities and lessons.

Continue

When you are ready, you can move on to the next task.

Summary

You used an LLM to create a knowledge graph from unstructured text.

Gen-AI - Hands-on Workshop

Knowledge Graphs, Unstructured Data, and Vectors

LLMs, RAG, Python, and LangChain

Constructing Knowledge Graphs with LLMs

Unstructured data and graphs

Extend your graph

Create an LLM

Graph Transformer

Extract entities and relationships

Map extracted entities to the paragraphs

Map extracted entities to the paragraphs

Add the graph documents

Querying the knowledge graph

Entities

Entities

Related lessons

Lesson entities

Labels, ids, and relationships

Explore the graph

Continue

Summary

Chatbot