In the last lesson, embeddings were automatically created for you by the Neo4jVector
class.
You are going to learn how to create embeddings directly and query Neo4j using Python.
Publicly available Large Language Models (LLMs) will typically have an API that you can use to create embeddings for text.
For example, OpenAI has an API that you can use to create embeddings for text.
The llm-vectors-unstructured/create_embeddings.py
program uses the OpenAI API and Python library to create embeddings for text.
import os
from dotenv import load_dotenv
load_dotenv()
from openai import OpenAI
llm = OpenAI(api_key=os.getenv('OPENAI_API_KEY'))
response = llm.embeddings.create(
input="Text to create embeddings for",
model="text-embedding-ada-002"
)
print(response.data[0].embedding)
You should be able to identify:
-
The
OpenAI
class requires an API key to be passed to it. -
The
llm.embeddings.create
method is used to create an embedding for a piece of text. -
The
text-embedding-ada-002
model is used to create the embedding. -
The
response.data[0].embedding
attribute is used to access the embedding.
Run the program and you should see the embedding returned:
[-0.02844466269016266, 0.009961248375475407, 0.0017426918493583798, -0.01016482524573803, 0.019080106168985367, 0.02178979106247425, -0.01836407743394375, -0.005099962465465069, -0.014285510405898094, ... ]
Experiment by changing the input text - you will see the embeddings change.
Query neo4j with an embedding
Next, you are going to use the embedding to query the Neo4j chunkVector
vector index you created in the last lesson.
Open the llm-vectors-unstructured/query_neo4j.py
program:
import os
from dotenv import load_dotenv
load_dotenv()
from openai import OpenAI
llm = OpenAI(api_key=os.getenv('OPENAI_API_KEY'))
response = llm.embeddings.create(
input="What does Hallucination mean?",
model="text-embedding-ada-002"
)
embedding = response.data[0].embedding
# Connect to Neo4j
# graph =
# Run query
# result =
# Display results
# for row ...
The program includes the code to create an embedding.
You will need to add the code to query Neo4j using the embedding.
First, import the langchain Neo4jGraph
class and create an object which will connect to the Neo4j sandbox:
from langchain_community.graphs import Neo4jGraph
graph = Neo4jGraph(
url=os.getenv('NEO4J_URI'),
username=os.getenv('NEO4J_USERNAME'),
password=os.getenv('NEO4J_PASSWORD')
)
Neo4jGraph
class provides a simple mechanism with LangChain to interact with Neo4j. It is not a full-featured Neo4j client.Use the query
method to run the Cypher to query the chunkVector
index using the embedding:
result = graph.query("""
CALL db.index.vector.queryNodes('chunkVector', 6, $embedding)
YIELD node, score
RETURN node.text, score
""", {"embedding": embedding})
The embedding is passed to the query
method as a key/value pair in a dictionary.
Finally, iterate through the result
and print the node.text
and score
values.
for row in result:
print(row['node.text'], row['score'])
Click to view the complete code
import os
from dotenv import load_dotenv
load_dotenv()
from openai import OpenAI
from langchain_community.graphs import Neo4jGraph
llm = OpenAI(api_key=os.getenv('OPENAI_API_KEY'))
response = llm.embeddings.create(
input="What does Hallucination mean?",
model="text-embedding-ada-002"
)
embedding = response.data[0].embedding
graph = Neo4jGraph(
url=os.getenv('NEO4J_URI'),
username=os.getenv('NEO4J_USERNAME'),
password=os.getenv('NEO4J_PASSWORD')
)
result = graph.query("""
CALL db.index.vector.queryNodes('chunkVector', 6, $embedding)
YIELD node, score
RETURN node.text, score
""", {"embedding": embedding})
for row in result:
print(row['node.text'], row['score'])
When running the program, you should see the chunk text printed followed by the score.
Try modifying the input text and see how the results change.
When you have successfully queried Neo4j using the embedding, you can move on to the next lesson.
Lesson Summary
In this lesson, you used the OpenAI API to create an embedding and queried Neo4j using Python.
In the next lesson, you will create a graph of the course content.