To query embeddings, you need to create a vector index. A vector index significantly increases the speed of similarity searches by pre-computing the similarity between vectors and storing them in the index.
In this lesson, you will create vector indexes on the embedding
property of the Question
and Answer
nodes.
Create the Question Index
You will use the CREATE VECTOR INDEX
Cypher statement to create the index:
CREATE VECTOR INDEX [index_name] [IF NOT EXISTS]
FOR (n:LabelName)
ON (n.propertyName)
OPTIONS "{" option: value[, ...] "}"
CREATE VECTOR INDEX
expects the following parameters:
-
index_name
- The name of the index -
LabelName
- The node label on which to index -
propertyName
- The property on which to index -
OPTIONS
- The options for the index, where you can specify:-
vector.dimensions
- The dimension of the embedding e.g. OpenAI embeddings consist of1536
dimensions. -
vector.similarity_function
- The similarity function to use when comparing values in this index - this can beeuclidean
orcosine
.
-
Review and run the following Cypher to create the vector index:
CREATE VECTOR INDEX questions IF NOT EXISTS
FOR (q:Question)
ON q.embedding
OPTIONS {indexConfig: {
`vector.dimensions`: 1536,
`vector.similarity_function`: 'cosine'
}}
Note that the index is called questions
, is against the Question
label, and is on the .embedding
property. The vector.dimensions
is 1536
(as used by OpenAI) and the vector.similarity_function
is cosine
. The IF NOT EXISTS
clause ensures that the statement only creates the index if it does not already exist.
Run the statement to create the index.
Choosing a Similarity Function
Generally, cosine will perform best for text embeddings, but you may want to experiment with other functions.
You can read more about similarity functions in the documentation.
Typically, you will choose a similarity function closest to the loss function used when training the embedding model. You should refer to the model’s documentation for more information.
Check the index creation status
The index will be updated asynchronously. You can check the status of the index population using the SHOW INDEXES
statement:
Check that you created the index successfully using the SHOW INDEXES
command.
SHOW INDEXES WHERE type = "VECTOR"
You should see a result similar to the following:
id |
name |
state |
populationPercent |
type |
1 |
"questions" |
"ONLINE" |
|
"VECTOR" |
Once the state
is listed as online, the index will be ready to query.
The populationPercentage
field indicates the proportion of node and property pairing.
When the populationPercentage
is 100.0
, all the question embeddings have been indexed.
Check Your Understanding
Create Vector Index
Your task is to create a vector index on authors' biographies.
The database contains Author
nodes that have name
, biography
, and biographyEmbedding
properties.
The biographyEmbedding
property is a vector representation of the biography
.
Select the correct syntax to create the vector index.
CREATE VECTOR INDEX authors IF NOT EXISTS
/*select:FOR (a:Author) ON a.biographyEmbedding*/
OPTIONS {indexConfig: {
`vector.dimensions`: 1536,
`vector.similarity_function`: 'cosine'
}}
-
❏
FOR (a:Author) ON a.biography
-
❏
FOR (a:Author) ON a.embedding
-
✓
FOR (a:Author) ON a.biographyEmbedding
Hint
Embeddings are vectors that represent the data. You create the vector index on the embedding of the biography.
Solution
You create the vector index on the biographyEmbedding
property of the Author
nodes.
CREATE VECTOR INDEX authors IF NOT EXISTS
FOR (a:Author) ON a.biographyEmbedding
Lesson Summary
In this lesson, you learned how to create a vector index using the CREATE VECTOR INDEX
Cypher statement.
In the next lesson, you will use what you have learned to create a vector index for the Answer
nodes.