Create a Vector Index

To query embeddings, you need to create a vector index. A vector index significantly increases the speed of similarity searches by pre-computing the similarity between vectors and storing them in the index.

In this lesson, you will create vector indexes on the embedding property of the Question and Answer nodes.

Create the Question Index

You will use the CREATE VECTOR INDEX Cypher statement to create the index:

cypher
CREATE VECTOR INDEX Syntax
CREATE VECTOR INDEX [index_name] [IF NOT EXISTS]
FOR (n:LabelName)
ON (n.propertyName)
OPTIONS "{" option: value[, ...] "}"

CREATE VECTOR INDEX expects the following parameters:

  • index_name - The name of the index

  • LabelName - The node label on which to index

  • propertyName - The property on which to index

  • OPTIONS - The options for the index, where you can specify:

    • vector.dimensions - The dimension of the embedding e.g. OpenAI embeddings consist of 1536 dimensions.

    • vector.similarity_function - The similarity function to use when comparing values in this index - this can be euclidean or cosine.

Review and run the following Cypher to create the vector index:

cypher
Create the vector index
CREATE VECTOR INDEX questions IF NOT EXISTS
FOR (q:Question)
ON q.embedding
OPTIONS {indexConfig: {
 `vector.dimensions`: 1536,
 `vector.similarity_function`: 'cosine'
}}

Note that the index is called questions, is against the Question label, and is on the .embedding property. The vector.dimensions is 1536 (as used by OpenAI) and the vector.similarity_function is cosine. The IF NOT EXISTS clause ensures that the statement only creates the index if it does not already exist.

Run the statement to create the index.

Choosing a Similarity Function

Generally, cosine will perform best for text embeddings, but you may want to experiment with other functions.

You can read more about similarity functions in the documentation.

Typically, you will choose a similarity function closest to the loss function used when training the embedding model. You should refer to the model’s documentation for more information.

Check the index creation status

The index will be updated asynchronously. You can check the status of the index population using the SHOW INDEXES statement:

Check that you created the index successfully using the SHOW INDEXES command.

cypher
Show Indexes
SHOW INDEXES WHERE type = "VECTOR"

You should see a result similar to the following:

Understand and search unstructured data using vector indexesShow Indexes Result

id

name

state

populationPercent

type

1

"questions"

"ONLINE"

100.0

"VECTOR"

Once the state is listed as online, the index will be ready to query.

The populationPercentage field indicates the proportion of node and property pairing.

When the populationPercentage is 100.0, all the question embeddings have been indexed.

Check Your Understanding

Create Vector Index

Your task is to create a vector index on authors' biographies.

The database contains Author nodes that have name, biography, and biographyEmbedding properties.

The biographyEmbedding property is a vector representation of the biography.

Select the correct syntax to create the vector index.

cypher
CREATE VECTOR INDEX authors IF NOT EXISTS
/*select:FOR (a:Author) ON a.biographyEmbedding*/
OPTIONS {indexConfig: {
 `vector.dimensions`: 1536,
 `vector.similarity_function`: 'cosine'
}}
  • FOR (a:Author) ON a.biography

  • FOR (a:Author) ON a.embedding

  • FOR (a:Author) ON a.biographyEmbedding

Hint

Embeddings are vectors that represent the data. You create the vector index on the embedding of the biography.

Solution

You create the vector index on the biographyEmbedding property of the Author nodes.

cypher
CREATE VECTOR INDEX authors IF NOT EXISTS
FOR (a:Author) ON a.biographyEmbedding

Lesson Summary

In this lesson, you learned how to create a vector index using the CREATE VECTOR INDEX Cypher statement.

In the next lesson, you will use what you have learned to create a vector index for the Answer nodes.