Introduction
Indexes are a powerful tool for improving query performance.
Indexes allow Neo4j to quickly locate nodes based on their properties, which can significantly reduce the time it takes to find anchor nodes and traverse relationships.
Use of indexes
The following query finds the orders for a specific customer, "Ernst Handel".
MATCH (c:Customer {companyName: "Ernst Handel"})-[:PURCHASED]->(o:Order)
RETURN c.companyName, o.orderID, o.requiredDate
ORDER BY o.requiredDateProfile the query
You can use PROFILE to see the execution plan.
PROFILE MATCH (c:Customer {companyName: "Ernst Handel"})-[:PURCHASED]->(o:Order)
RETURN c.companyName, o.orderID, o.requiredDate
ORDER BY o.requiredDateRun the query, review the plan, and try to identify the operations being performed.
Scanning all the customers
The profile of this query shows that:
-
Neo4j is performing a
NodeByLabelScanoperation onCustomer. -
Before a
Filteroperation to find the specific customer.
Neo4j has to scan all Customer nodes to find the one with the matching companyName.
Create an index
You can improve the performance of this query by creating an index on the companyName property of the Customer label:
CREATE INDEX companyName_Customer
IF NOT EXISTS
FOR (c:Customer) ON (c.companyName)Profile with an index
Running the same query again after creating the index shows that Neo4j is now using a NodeIndexSeek operator to find the specific customer:
PROFILE MATCH (c:Customer {companyName: "Ernst Handel"})-[:PURCHASED]->(o:Order)
RETURN c.companyName, o.orderID, o.requiredDate
ORDER BY o.requiredDateCreating an index on the companyName property allows Neo4j to quickly locate the specific customer node, which significantly improves the performance of the query.
Text indexes
You can use a text index to improve the performance of queries that involve partial matching of string properties using CONTAINS, STARTS WITH, or ENDS WITH.
Create an index for Product names
The following query finds the supplier for a specific product, "Tofu".
MATCH (p:Product {productName: "Tofu"})<-[:SUPPLIES]-(s:Supplier)
RETURN p.productName, s.companyNameYou challenge is to:
-
Profile the query and identify the operations being performed.
-
Create an index on the
productNameproperty of theProductlabel. -
Review the new query plan to see how it has changed.
Click to reveal the solution
-
Use
PROFILEto analyze the query and identify that it is performing aNodeByLabelScanonProductto find the node with the matchingproductName.cypherProfile the queryPROFILE MATCH (p:Product {productName: "Tofu"})<-[:SUPPLIES]-(s:Supplier) RETURN p.productName, s.companyName -
Create an index on the
productNameproperty of theProductlabel:cypherCreate an index on <code>productName</code>CREATE INDEX productName_Product IF NOT EXISTS FOR (p:Product) ON p.productName -
Run the query again and review the new query plan to see that it is now using a
NodeIndexSeekto find the specific product node:cypherProfile the query with the new indexPROFILE MATCH (p:Product {productName: "Tofu"})<-[:SUPPLIES]-(s:Supplier) RETURN p.productName, s.companyName -
The new query plan will use a
NodeIndexSeekto find the specific product node significantly improving the performance of the query.
Anchor nodes
Anchor nodes are the starting points for graph pattern matching in Cypher queries. They represent the initial nodes that Neo4j locates before traversing relationships to find connected data.
An anchor node is typically:
-
A node (or nodes) with specific property values that can be efficiently located using labels or indexes
-
The first node matched in a query pattern before following relationships
-
A node that provides a focused entry point into the graph structure
Why anchor nodes matter
Anchor nodes are crucial for query performance because they determine how Neo4j begins executing a query:
-
Efficient Starting Points - Properly indexed anchor nodes allow Neo4j to quickly locate specific nodes using
NodeIndexSeekoperations instead of scanning all nodes with a label (NodeByLabelScan) -
Reduced Search Space - By identifying the correct anchor node first, Neo4j limits the scope of relationship traversals, avoiding unnecessary exploration of the graph
-
Query Optimization - The Neo4j query planner can create more efficient execution plans when anchor nodes are clearly defined and indexed
-
Scalability - As graphs grow larger, efficient anchor node identification becomes increasingly important to maintain fast query response times
-
Resource Conservation - Well-chosen anchor nodes reduce CPU usage and memory consumption during query execution
Customer Anchor Node
In this query:
MATCH (c:Customer {companyName: "Ernst Handel"})-[:PURCHASED]->(o:Order)
RETURN c.companyName, o.orderID, o.requiredDateThe Customer node with companyName: "Ernst Handel" serves as the anchor node. With an index on companyName, Neo4j can quickly locate this specific customer before traversing the PURCHASED relationships to find related orders.
Best practices for anchor nodes
Best practices for anchor nodes:
-
Create indexes on properties used to identify anchor nodes
-
Choose anchor nodes with selective property values (avoid properties with many duplicate values)
-
Position anchor nodes early in your MATCH patterns
-
Use PROFILE to verify that your queries are using efficient anchor node operations
Lesson Summary
In this lesson, you learned about the importance of indexes for query performance and how to create them in Neo4j.
In the next lesson, you will learn how to use the count store to optimize queries that count or check for the existence of nodes and relationships.