Creating Knowledge Graphs

How you create a knowledge graph depends on the type of data you have and how you want to structure it.

Unstructured data

Unstructured data, such as text documents, web pages, or PDFs, can be a rich source for knowledge graphs.

However, creating knowledge graphs from unstructured data can be complex, involving multiple steps of data query, cleanse, and transform.

You can use the text analysis capabilities of Large Language Models (LLMs) to help automate knowledge graph creation.

Typically, you would follow these steps to construct a knowledge graph from unstructured text using an LLM:

  1. Gather the data

    The data could be from multiple sources and in different formats.

  2. Chunk the data

    Break down the data into manageable parts, or chunks, that the LLM can process effectively.

  3. Vectorize the data

    Depending on your requirements for querying and searching the data, you may need to create vector embeddings

  4. Pass the data to an LLM

    Extract entities (nodes) and relationships from the data. You may provide additional context or constraints for the extraction, such as the type of entities or relationships you are interested in extracting.

  5. Generate the graph

    Use the output from the LLM to create nodes and relationships in the graph.

You can learn more about how to construct knowledge graphs from unstructured data in the GraphAcademy Building Knowledge Graphs with LLMs course.

Structured data

Constructing knowledge graphs from structured data is often more straightforward than from unstructured sources. Structured data is already organized, making it easier to map nodes, relationships, and relationships directly into a graph.

To create a knowledge graph from structured data, you typically:

  1. Identify the sources

    The data sources could be other graphs, relational database, CSV files, APIs.

  2. Analysize the data

    Understand the entities, attributes, and relationships in your data (for example, rows in a table may represent entities, columns as attributes, and foreign keys as relationships).

  3. Define a graph schema The schema should represent the entities as nodes and relationships as well as defining the organizing principles for the graph.

  4. Create the graph

    Transform and import the data into the graph database.

This process allows you to leverage existing structured data sources—such as relational databases, CSV files, or APIs—to quickly build a knowledge graph that can be queried and expanded as needed.

You can learn more about importing data into Neo4j in the GraphAcademy Importing Data Fundamentals course.

Check Your Understanding

Constructing Knowledge Graphs: Structured vs Unstructured Data

What is a key difference between constructing knowledge graphs from structured data versus unstructured data?

  • ✓ Structured data can be directly mapped to nodes and relationships, while unstructured data requires extraction of entities and relationships using technologies like LLMs.

  • ❏ Unstructured data is always easier to import into a graph database than structured data.

  • ❏ Structured data cannot be used to create knowledge graphs.

  • ❏ Both structured and unstructured data require the same processing steps.

Hint

The process for constructing knowledge graphs from structured data is often more straightforward than from unstructured sources. Structured data is already organized, making it easier to map nodes, relationships, and attributes directly into a graph.

Solution

The correct answer is:

  • Structured data can be directly mapped to nodes and relationships, while unstructured data requires extraction of entities and relationships using technologies like LLMs.

Structured data is already organized, making it easier to import into a graph, while unstructured data needs to be processed to identify and extract the relevant entities and relationships.

Lesson Summary

In this lesson, you learned about the process for constructing knowledge graphs.

In the next module, you will use the Python and the neo4j-graphrag package to explore GraphRAG techniques with Neo4j.