You next task is to split a piece of text into chunks then create embeddings for each chunk. You will use the GraphRAG Python package and OpenAI to do this.
Getting Started
Open the 1-knowledge-graphs-vectors\create_and_embed_chunks.py file in your code editor.
Unresolved directive in lesson.adoc - include::{repository-raw}/main/1-knowledge-graphs-vectors/create_and_embed_chunks.py[]Creating Chunks
We’re going to use the first paragraph of the Wikipedia article for London for this challenge. Feel free to change this to something else though if you’d like.
In order to split the text we need to import a text splitter from the GraphRAG Python package.
Here we’ll use the FixedSizeSplitter, which splits text into fixed size chunks of chunk_size characters, with an overlap of chunk_overlap between chunks.
This is a very simple splitter, however our package supports more advanced splitters from LangChain and LlamaIndex.
Add the following to your script and experiment with the chunk_size and chunk_overlap parameters.
How does this change the text chunks that are produced?
Unresolved directive in lesson.adoc - include::{repository-raw}/main/1-knowledge-graphs-vectors/solutions/create_and_embed_chunks.py[tag=create_chunks]Creating Embedding
Next we’ll create embeddings for our chunks.
In order to do this we need an embedding model.
We can use the text-embedding-3-large from OpenAI as our embedding model.
Add the following to your script and run it to view the embedding created for the first chunk.
Unresolved directive in lesson.adoc - include::{repository-raw}/main/1-knowledge-graphs-vectors/solutions/create_and_embed_chunks.py[tag=embed_chunks]Continue
When you are ready, you can move on to the next task.
Summary
You learned to use Python and the GraphRAG Python package to split text into chunks and create embeddings for them.