How you create a knowledge graph depends on the type of data you have and how you want to structure it.
Unstructured data
Unstructured data, such as text documents, web pages, or PDFs, can be a rich source for knowledge graphs.
However, creating knowledge graphs from unstructured data can be complex, involving multiple steps of data query, cleanse, and transform.
You can use the text analysis capabilities of Large Language Models (LLMs) to help automate knowledge graph creation.
Typically, you would follow these steps to construct a knowledge graph from unstructured text using an LLM:
-
Gather the data
The data could be from multiple sources and in different formats.
-
Chunk the data
Break down the data into manageable parts, or chunks, that the LLM can process effectively.
-
Vectorize the data
Depending on your requirements for querying and searching the data, you may need to create vector embeddings
-
Pass the data to an LLM
Extract entities (nodes) and relationships from the data. You may provide additional context or constraints for the extraction, such as the type of entities or relationships you are interested in extracting.
-
Generate the graph
Use the output from the LLM to create nodes and relationships in the graph.
Structured data
Constructing knowledge graphs from structured data is often more straightforward than from unstructured sources. Structured data is already organized, making it easier to map nodes, relationships, and relationships directly into a graph.
To create a knowledge graph from structured data, you typically:
-
Identify the sources
The data sources could be other graphs, relational database, CSV files, APIs.
-
Analysize the data
Understand the entities, attributes, and relationships in your data (for example, rows in a table may represent entities, columns as attributes, and foreign keys as relationships).
-
Define a graph schema The schema should represent the entities as nodes and relationships as well as defining the organizing principles for the graph.
-
Create the graph
Transform and import the data into the graph database.
This process allows you to leverage existing structured data sources—such as relational databases, CSV files, or APIs—to quickly build a knowledge graph that can be queried and expanded as needed.
You can learn more about importing data into Neo4j in the GraphAcademy Importing Data Fundamentals course.
Check Your Understanding
Constructing Knowledge Graphs: Structured vs Unstructured Data
What is a key difference between constructing knowledge graphs from structured data versus unstructured data?
-
✓ Structured data can be directly mapped to nodes and relationships, while unstructured data requires extraction of entities and relationships using technologies like LLMs.
-
❏ Unstructured data is always easier to import into a graph database than structured data.
-
❏ Structured data cannot be used to create knowledge graphs.
-
❏ Both structured and unstructured data require the same processing steps.
Hint
The process for constructing knowledge graphs from structured data is often more straightforward than from unstructured sources. Structured data is already organized, making it easier to map nodes, relationships, and attributes directly into a graph.
Solution
The correct answer is:
-
Structured data can be directly mapped to nodes and relationships, while unstructured data requires extraction of entities and relationships using technologies like LLMs.
Structured data is already organized, making it easier to import into a graph, while unstructured data needs to be processed to identify and extract the relevant entities and relationships.
Lesson Summary
In this lesson, you learned about the process for constructing knowledge graphs.
In the next module, you will use the Python and the neo4j-graphrag
package to explore GraphRAG techniques with Neo4j.