Controlling Index Usage

Single index used by default

A MATCH clause will use a single index by default.

To illustrate this, execute this code multiple times and observe the lowest elapsed time.

cypher
PROFILE MATCH
(p:Person)-[:ACTED_IN]->(m:Movie)<-[:DIRECTED]-(p2:Person)
WHERE
p.name CONTAINS 'John'
AND
p2.name CONTAINS 'George'
RETURN p.name, p2.name,  m.title

It should return 17 rows and have an elapsed time of ~7 ms in the Plan view. Notice that the TEXT index is used to anchor the query on the p2 end of the path. This is because the index is more favorably populated with fewer actors named George so this will reduce the number of rows returned.

Consult the Manual

Before you create the indexes for your application, please read the section in the Cypher Reference Manual that has many examples how indexes are used.

Specifying a query hint

In general, the query planner does a good job in determining which index to use to anchor a query. You can force a how an index will be used by specifying USING INDEX, called a query hint.

Execute this code (multiple times) that tells the query planner to use p as the anchor of the query and use the index for that end of the path:

cypher
PROFILE MATCH
(p:Person)-[:ACTED_IN]->(m:Movie)<-[:DIRECTED]-(p2:Person)
USING INDEX p:Person(name)
WHERE
p.name CONTAINS 'John'
AND
p2.name CONTAINS 'George'
RETURN p.name, p2.name,  m.title

Notice that with this query, the anchor is the "p" end of the query path and the TEXT index is used. Notice also that this query does not perform as well as the default index usage. It has the same elapsed time, but requires more db hits.

Verify Query Performance

If you add USING INDEX clauses to your cypher code, you must ensure that it makes your query perform better. A hint with a specified index type is only possible when the planner knows that using an index of the specified type does not change the results. You must provide query planner hints carefully in your code as described here

Using multiple indexes

Depending upon your use cases and graph, it may be better to use more than one index for a query.

Execute this code (multiple times) that will enable index usage for both ends of the query path.

cypher
PROFILE MATCH
(p:Person)-[:ACTED_IN]->(m:Movie)<-[:DIRECTED]-(p2:Person)
USING INDEX p:Person(name)
USING INDEX p2:Person(name)
WHERE
p.name CONTAINS 'John'
AND
p2.name CONTAINS 'George'
RETURN p.name, p2.name,  m.title

This query executes about the same at ~4 ms but with fewer db hits.

This query uses the TEXT index to find the Person nodes that contain John and it uses the TEXT index to find the Person nodes that contain George. Then it does a join to return the movie titles these nodes share. Not all queries will benefit from query hints so you should be careful to fully test your queries/indexes.

Consistently check query performance

As you develop your application, it will be very important to profile the most important queries of your application, add the appropriate indexes, and tune your queries as an iterative process.

Query hints for relationships

We already have an index on the RATED.rating property.

Run this query multiple times. It does not use any index, even the one defined for the RATED.rating property:

cypher
PROFILE MATCH
(u:User)-[r:RATED]->(m:Movie)
WHERE
u.name CONTAINS 'Johnson'
AND
r.rating = 5
RETURN u.name, r.rating, m.title

Suppose we want to see if using the index on the relationship might help. Here we specify that we want to use this index.

Execute this code multiple times:

cypher
PROFILE MATCH
(u:User)-[r:RATED]->(m:Movie)
USING INDEX r:RATED(rating)
WHERE
u.name CONTAINS 'Johnson'
AND
r.rating = 5
RETURN u.name, r.rating, m.title

This execution uses the index on the RATED relationship, but as you can see, it is not better. That is why the query planner chose not to use the index.

You must use caution and test any query hints you intend to use in your application.

Check your understanding

1. Providing a query hint

Suppose we have a graph that contains Company nodes. One of the properties of a Company node is name and we have a TEXT index on that property. The graph also has Employee nodes that has a WORKS_AT relationship to Company nodes. An Employee node has a name property and we have a TEXT index on that property.

Based upon our query testing, we have decided that we want to ensure that the TEXT index for the name property of a company is always used in this query. How do you specify this query hint?:

Once you have selected your option, click the Check Results query button to continue.

cypher
PROFILE MATCH
(e:Employee)-[:WORKS_AT]->(c:Company)
/*select:USING INDEX c:Company(name)*/
WHERE
e.name CONTAINS 'John'
AND
c.name CONTAINS 'Processing'
RETURN p.name,  m.title
  • USING INDEX c:Company_name_text

  • USING INDEX Company_name_text

  • USING TEXT INDEX Company.name

  • USING INDEX c:Company(name)

Hint

You must specify the variable used in the query path.

Solution

The correct code for providing this query hint is:

USING INDEX c:Company(name)

2. How many indexes?

Suppose we have TEXT indexes on Scientist(name), Science(name), and Pioneer(name)

For this query:

cypher
MATCH (s:Scientist)-[:RESEARCHED]->(sc:Science)<-[:INVENTED_BY]-(p:Pioneer)
WHERE s.name CONTAINS 'William'
AND sc.name CONTAINS 'Neuro'
AND p.name CONTAINS 'John'
RETURN s.name, sc.name, p.name

How many indexes are used?

  • ❏ 0

  • ✓ 1

  • ❏ 2

  • ❏ 3

Hint

This query contains no query hints and the query processor must anchor the traversal.

Solution

The correct answer is 1. By default, the query processor will use a single index.

Summary

In this lesson, you learned how to provide query hints for index usage. In the next Challenge, you will add a query hint to a query.