Split Text Into Chunks and Create Embeddings

You next task is to split a piece of text into chunks then create embeddings for each chunk. You will use the GraphRAG Python package and OpenAI to do this.

Getting Started

Open the 1-knowledge-graphs-vectors\create_and_embed_chunks.py file in your code editor.

python
Unresolved directive in lesson.adoc - include::{repository-raw}/main/1-knowledge-graphs-vectors/create_and_embed_chunks.py[]

Creating Chunks

We’re going to use the first paragraph of the Wikipedia article for London for this challenge. Feel free to change this to something else though if you’d like.

In order to split the text we need to import a text splitter from the GraphRAG Python package.

Here we’ll use the FixedSizeSplitter, which splits text into fixed size chunks of chunk_size characters, with an overlap of chunk_overlap between chunks.

This is a very simple splitter, however our package supports more advanced splitters from LangChain and LlamaIndex.

Add the following to your script and experiment with the chunk_size and chunk_overlap parameters. How does this change the text chunks that are produced?

python
Unresolved directive in lesson.adoc - include::{repository-raw}/main/1-knowledge-graphs-vectors/solutions/create_and_embed_chunks.py[tag=create_chunks]

Creating Embedding

Next we’ll create embeddings for our chunks.

In order to do this we need an embedding model.

We can use the text-embedding-3-large from OpenAI as our embedding model.

Add the following to your script and run it to view the embedding created for the first chunk.

python
Unresolved directive in lesson.adoc - include::{repository-raw}/main/1-knowledge-graphs-vectors/solutions/create_and_embed_chunks.py[tag=embed_chunks]

Continue

When you are ready, you can move on to the next task.

Summary

You learned to use Python and the GraphRAG Python package to split text into chunks and create embeddings for them.

Chatbot

How can I help you today?