Creating a graph

Creating a graph

In the previous task, you used the Neo4jVector class to create Chunk nodes in the graph. Using Neo4jVector is an efficient and easy way to get started.

To create a graph where you can also understand the relationships within the data, you must incorporate the metadata into the data model.

In this lesson, you will create a graph of the course content.

Data Model

You will create a graph of the course content containing the following nodes, properties, and relationships:

  • Course, Module, and Lesson nodes with a name property

  • A url property on Lesson nodes will hold the GraphAcademy URL for the lesson

  • Paragraph nodes will have id, text, and embedding properties

  • The HAS_MODULE, HAS_LESSON, and CONTAINS relationships will connect the nodes

Data model showing Course

Data Model

You can extract the name properties and url metadata from the directory structure of the lesson files.

For example, the first lesson of the Neo4j & LLM Fundamentals course has the following path:

courses\llm-fundamentals\modules\1-introduction\lessons\1-neo4j-and-genai\lesson.adoc

The following metadata is in the path:

  • Course.name - llm-fundamentals

  • Module.name - 1-introduction

  • Lesson.name - 1-neo4j-and-genai

  • Lesson.url - graphacademy.neo4j.com/courses/{Course.name}/{{Module.name}}/{Lesson.name}

Building the graph

Open the 1-knowledge-graphs-vectors\build_graph.py starter code in your code editor.

The starter code loads and chunks the course content.

python
Load and chunk the content
Unresolved directive in lesson.adoc - include::{repository-raw}/main/1-knowledge-graphs-vectors/build_graph.py[]

For each chunk, you will have to:

  1. Create an embedding of the text.

  2. Extract the metadata.

Extracting the data

Create an OpenAI embedding provider instance to generate the embeddings:

python
Create embedding_provider
Unresolved directive in lesson.adoc - include::{repository-raw}/main/1-knowledge-graphs-vectors/solutions/build_graph.py[tag=embedding]

Extracting the data

Create a function to extract the metadata from the chunk:

python
Get course data
Unresolved directive in lesson.adoc - include::{repository-raw}/main/1-knowledge-graphs-vectors/solutions/build_graph.py[tag=get_course_data]

The get_course_data function:

  1. Splits the document source path to extract the course, module, and lesson names

  2. Constructs the url using the extracted names

  3. Creates a unique id for the paragraph from the file name and the chunk position

  4. Extracts the text from the chunk

  5. Creates an embedding using the embedding_provider instance

  6. Returns a dictionary containing the extracted data

Creating the graph

To create the graph, you will need to:

  1. Connect to the Neo4j database

  2. Iterate through the chunks

  3. Extract the course data from each chunk

  4. Create the nodes and relationships in the graph

Connect

Connect to the Neo4j sandbox:

python
Unresolved directive in lesson.adoc - include::{repository-raw}/main/1-knowledge-graphs-vectors/solutions/build_graph.py[tag=neo4j]

Test the connection

You could run your code now to check that you can connect to the OpenAI API and Neo4j sandbox.

Create data

To create the data in the graph, you will need a function that incorporates the course data into a Cypher statement and runs it:

python
Create chunk function
Unresolved directive in lesson.adoc - include::{repository-raw}/main/1-knowledge-graphs-vectors/solutions/build_graph.py[tag=create_chunk]

The create_chunk function accepts the data dictionary created by the get_course_data function.

You should be able to identify the following parameters in the Cypher statement:

  • $course

  • $module

  • $lesson

  • $url

  • $id

  • $text

  • $embedding

Create chunk

Iterate through the chunks and execute the create_chunk function:

python
Unresolved directive in lesson.adoc - include::{repository-raw}/main/1-knowledge-graphs-vectors/solutions/build_graph.py[tag=create]

The metadata is found for each chunk and used to create a new chunk in the graph.

Click to view the complete code
Unresolved directive in lesson.adoc - include::{repository-raw}/main/1-knowledge-graphs-vectors/solutions/build_graph.py[tag=**]

Run the code to create the graph.

Processing time

The program will take a minute or two to complete as it creates the embeddings for each paragraph.

Explore the graph

View the graph by running the following Cypher:

cypher
MATCH (c:Course)-[:HAS_MODULE]->(m:Module)-[:HAS_LESSON]->(l:Lesson)-[:CONTAINS]->(p:Paragraph)
RETURN *
Result from the Cypher

Create vector index

You will need to create a vector index to query the paragraph embeddings.

cypher
Create Vector Index
CREATE VECTOR INDEX paragraphs IF NOT EXISTS
FOR (p:Paragraph)
ON p.embedding
OPTIONS {indexConfig: {
 `vector.dimensions`: 1536,
 `vector.similarity_function`: 'cosine'
}}

Query the vector index

You can use the vector index and the graph to find a lesson to help with specific questions:

cypher
Find a lesson
WITH genai.vector.encode(
    "How does RAG help ground an LLM?",
    "OpenAI",
    { token: $openAiApiKey }) AS userEmbedding
CALL db.index.vector.queryNodes('paragraphs', 6, userEmbedding)
YIELD node, score
MATCH (l:Lesson)-[:CONTAINS]->(node)
RETURN l.name, l.url, score

Summary

You created a graph of the course content using the Neo4j and LangChain.

Chatbot

How can I help you today?