Creating embeddings

In the last lesson, embeddings were automatically created for you by the Neo4jVector class.

You are going to learn how to create embeddings directly and query Neo4j using Python.

Publicly available Large Language Models (LLMs) will typically have an API that you can use to create embeddings for text.

For example, OpenAI has an API that you can use to create embeddings for text.

The llm-vectors-unstructured/create_embeddings.py program uses the OpenAI API and Python library to create embeddings for text.

python
import os
from dotenv import load_dotenv
load_dotenv()

from openai import OpenAI

llm = OpenAI(api_key=os.getenv('OPENAI_API_KEY'))

response = llm.embeddings.create(
        input="Text to create embeddings for",
        model="text-embedding-ada-002"
    )

print(response.data[0].embedding)

You should be able to identify:

  • The OpenAI class requires an API key to be passed to it.

  • The llm.embeddings.create method is used to create an embedding for a piece of text.

  • The text-embedding-ada-002 model is used to create the embedding.

  • The response.data[0].embedding attribute is used to access the embedding.

Run the program and you should see the embedding returned:

[-0.02844466269016266, 0.009961248375475407, 0.0017426918493583798, -0.01016482524573803, 0.019080106168985367, 0.02178979106247425, -0.01836407743394375, -0.005099962465465069, -0.014285510405898094, ... ]

Experiment by changing the input text - you will see the embeddings change.

Query neo4j with an embedding

Next, you are going to use the embedding to query the Neo4j chunkVector vector index you created in the last lesson.

Open the llm-vectors-unstructured/query_neo4j.py program:

python
import os
from dotenv import load_dotenv
load_dotenv()

from openai import OpenAI

llm = OpenAI(api_key=os.getenv('OPENAI_API_KEY'))

response = llm.embeddings.create(
        input="What does Hallucination mean?",
        model="text-embedding-ada-002"
    )

embedding = response.data[0].embedding

# Connect to Neo4j
# graph = 

# Run query
# result = 

# Display results
# for row ... 

The program includes the code to create an embedding.

You will need to add the code to query Neo4j using the embedding.

First, import the LangChain Neo4jGraph class and create an object which will connect to the Neo4j sandbox:

python
from langchain_neo4j import Neo4jGraph

graph = Neo4jGraph(
    url=os.getenv('NEO4J_URI'),
    username=os.getenv('NEO4J_USERNAME'),
    password=os.getenv('NEO4J_PASSWORD')
)

The Neo4jGraph class provides a simple mechanism with LangChain to interact with Neo4j. It is not a full-featured Neo4j client.

Use the query method to run the Cypher to query the chunkVector index using the embedding:

python
result = graph.query("""
CALL db.index.vector.queryNodes('chunkVector', 6, $embedding)
YIELD node, score
RETURN node.text, score
""", {"embedding": embedding})

The embedding is passed to the query method as a key/value pair in a dictionary.

Finally, iterate through the result and print the node.text and score values.

python
for row in result:
    print(row['node.text'], row['score'])
Click to view the complete code
import os
from dotenv import load_dotenv
load_dotenv()

from openai import OpenAI

from langchain_neo4j import Neo4jGraph

llm = OpenAI(api_key=os.getenv('OPENAI_API_KEY'))

response = llm.embeddings.create(
        input="What does Hallucination mean?",
        model="text-embedding-ada-002"
    )

embedding = response.data[0].embedding

graph = Neo4jGraph(
    url=os.getenv('NEO4J_URI'),
    username=os.getenv('NEO4J_USERNAME'),
    password=os.getenv('NEO4J_PASSWORD')
)

result = graph.query("""
CALL db.index.vector.queryNodes('chunkVector', 6, $embedding)
YIELD node, score
RETURN node.text, score
""", {"embedding": embedding})

for row in result:
    print(row['node.text'], row['score'])

When running the program, you should see the chunk text printed followed by the score.

Try modifying the input text and see how the results change.

When you have successfully queried Neo4j using the embedding, you can move on to the next lesson.

Lesson Summary

In this lesson, you used the OpenAI API to create an embedding and queried Neo4j using Python.

In the next lesson, you will create a graph of the course content.