Full-text indexes in Neo4j
A full-text index is designed for queries that need to search string properties using possibly complex search predicates. It can also be created to index multiple properties in multiple node labels or relationship types.
In many cases, a full-text index will help improve the performance of queries where the predicates test complex patterns in string properties. For example:
-
All movie titles that contain "Night" followed by "Sky" later in the title.
-
All movie plots that have "murder" but not "drugs" in them.
-
All movie titles that contain "Night" and "Dead", with a plot that contains "French".
Some of these types of queries can be very expensive using standard RANGE or TEXT indexes. A full-text index may perform better. You create a full-text index on node properties or on relationship properties.
You use the CREATE
clause to create a full-text index, but what makes full-text indexes different from other index types (RANGE, TEXT), is that they are not automatically used in your Cypher queries.
You must call a special procedure, db.index.fulltext.queryNodes()
to query node properties using the index.
And you call db.index.fulltext.queryRelationships()
to query relationship properties using the index.
Syntax for creating a full-text index for a node property
Here is the syntax for creating a full-text index for a property of a node:
CREATE FULLTEXT INDEX <index_name> IF NOT EXISTS
FOR (x:<node_label>)
ON EACH [x.<property_key>]
You specify the name of the index, the node label it will be associated with, and the name of the property.
-
If an index already exists in the graph with the same name, no index is created.
-
If an index does not exist in the graph with the same name:
-
No full-text index is created if there already is a full-text index for that node label and property key.
-
Otherwise, the full-text index is created.
-
Creating the full-text index for the plot property of a Movie node
Execute this Cypher to create your first full-text index in the graph:
CREATE FULLTEXT INDEX Movie_plot_ft IF NOT EXISTS
FOR (x:Movie)
ON EACH [x.plot]
You can use SHOW INDEXES
to view your indexes in the graph.
Querying with the full-text index
Unlike RANGE and TEXT indexes that are automatically used at runtime, you must explicitly call the procedure that uses the full-text index. When you call the procedure to query using the full-text index, you use the Lucene query to retrieve the nodes. Execute this Cypher to retrieve all Movie nodes that have both "murder" and "drugs" in them.
CALL db.index.fulltext.queryNodes
("Movie_plot_ft", "murder AND drugs")
YIELD node
RETURN node.title, node.plot
When you profile this query, you will observe the elapsed time of ~4 ms (2nd execution).
The total db hits statistics do not reflect the true db hits because the PROFILE
will never expose details of a procedure.
All that you can measure for queries that call procedures is elapsed time.
Compare TEXT index with full-text index
You should always profile your queries to determine if they might benefit from a TEXT index versus a full-text index. Let’s compare the performance of each type of index.
Create a TEXT index on the plot property:
CREATE TEXT INDEX Movie_plot_text IF NOT EXISTS
FOR (x:Movie)
ON (x.plot)
Now profile the query using the TEXT index (at least twice):
PROFILE MATCH (m:Movie)
WHERE m.plot CONTAINS "murder"
AND m.plot CONTAINS "drugs"
RETURN m.title,m.plot
The query using the TEXT index performs similarly to the full-text index for this particular query and data in the graph.
Syntax for creating a full-text index for a relationship property
Here is the syntax for creating a full-text index for a property of a relationship:
CREATE FULLTEXT INDEX <index_name> IF NOT EXISTS
FOR ()-[x:<RELATIONSHIP_TYPE>]-()
ON EACH [x.<property_key>]
You specify the name of the index, the relationship type it will be associated with, and the name of the property.
-
If an index already exists in the graph with the same name, no index is created.
-
If an index does not exist in the graph with the same name:
-
No full-text index is created if there already is a full-text index for that relationship type and property key.
-
Otherwise, the full-text index is created.
-
Syntax for querying relationships with the full-text index
CALL db.index.fulltext.queryRelationships
("<index_name>", "<lucene_query>")
YIELD relationship
RETURN relationship.<property>
In the next Challenge, you will have an opportunity to create and use a full-text index on a relationship property.
Syntax for creating a full-text index for multiple node labels and properties
The advantage of a full-text index is that it can span multiple node labels and multiple properties to support complex string queries.
Here is the syntax for creating a full-text index for multiple properties of multiple node labels:
CREATE FULLTEXT INDEX <index_name> IF NOT EXISTS
(x:<node_label1> | <node_label2> | ...)
ON EACH [x.<property_key1>, x.<property_key2>,...]
Creating a full-text index for multiple properties and node labels
Suppose we want to optimize this query where we are testing properties in two different node labels, Actor and Movie. We want to return the actors or Movies in the graph where:
-
The movie has "british" in its plot and "death" in its title
-
The actor has "british" and "actress" in its bio
To optimize these CONTAINS
queries, create these TEXT indexes:
CREATE TEXT INDEX Movie_plot_text
IF NOT EXISTS
FOR (x:Movie)
ON (x.plot);
CREATE TEXT INDEX Movie_title_text IF NOT EXISTS
FOR (x:Movie)
ON (x.text);
CREATE TEXT INDEX Actor_bio_text IF NOT EXISTS
FOR (x:Actor)
ON (x.bio)
Next, execute this query:
PROFILE MATCH (m:Movie)
WHERE (m.plot CONTAINS "british"
OR m.plot CONTAINS "British")
AND (m.title CONTAINS "death"
OR m.title CONTAINS "Death")
RETURN m.name AS Name, m.bio AS Bio,
m.title AS Title, m.plot AS Plot
UNION ALL
MATCH (a:Actor)
WHERE (a.bio CONTAINS "British"
OR a.bio CONTAINS "british")
AND (a.bio CONTAINS "actress"
OR a.bio contains "Actress")
RETURN a.name AS Name, a.bio AS Bio,
a.title AS Title, a.plot AS Plot
For this query, we specify both the lower and upper case beginnings of the test strings so that the TEXT index can be used. If we were to use toLower() or to toUpper() in our predicates, the TEXT index would not be used.
The best elapsed time for this query is ~8 ms and it returns 166 rows.
Let’s see if we can do better with a full-text index. Execute this code to create the full-text index:
CREATE FULLTEXT INDEX Actor_bio_Movie_plot_title_ft IF NOT EXISTS
FOR (x:Actor | Movie)
ON EACH [x.bio, x.plot, x.title]
This index is used for Actor and Movie nodes where the properties indexed are bio, plot, and title.
Now let’s do the query using the full-text index:
PROFILE CALL db.index.fulltext.queryNodes
("Actor_bio_Movie_plot_title_ft",
"(plot: british AND title: death)
OR (bio: british AND bio: actress)")
YIELD node
RETURN node.name AS Name, node.bio AS Bio,
node.title AS Title, node.plot AS Plot
This query returns both movie and actor nodes that match the properties specified in the second argument to queryNodes().
This query returns in ~13 ms, better than the previous query that used the TEXT indexes.
Retrieving the score for a full-text search
When a full-text index is used, it calculates a "hit score" that represents the closeness of the values in the graph to the query string.
Here is an example where our search is for any node that contains matrix or reloaded in its title:
CALL db.index.fulltext.queryNodes(
'Actor_bio_Movie_plot_title_ft',
'title: matrix reloaded')
YIELD node, score
WITH score, node
WHERE node:Movie
RETURN node.title, score
The nodes returned have a Lucene score based upon how much of matrix and reloaded was in the title.
You can explore some ways that full-text indexes can be used in the Cypher Reference Manual.
To get the most out of queries that use full-text indexes, you should also refer to the Lucene Query Language
Dropping full-text indexes
Since you create a full-text index with a name, execute this code to drop the full-text index:
DROP INDEX Actor_bio_Movie_plot_title_ft
You can use SHOW INDEXES
to view your indexes in the graph.
Check your understanding
1. Creating a full-text index
Suppose we have a graph that contains Company nodes. One of the properties of a Company node is description. We want to be able to optimize queries that test whether the Company description contains a string pattern that has a combination of phrases. The Cypher to perform this query would not be efficient so we want to create a full-text index that will be used for our queries.
-
✓
CREATE FULLTEXT INDEX Company_description_ft IF NOT EXISTS FOR (x:Company) ON EACH [x.description]
-
❏
CREATE FULLTEXT INDEX Company_description_ft IF NOT EXISTS FOR (x:Company) ON (x.description)
-
❏
CREATE FULL-TEXT INDEX Company_description_ft IF NOT EXISTS FOR (x:Company) ON (x.description)
-
❏
CREATE FULL-TEXT INDEX Company_description_ft IF NOT EXISTS ON (Company.description)
Hint
You are creating a FULLTEXT index on every description property of each Company node.
Solution
The correct code for creating the full-text index is:
CREATE FULLTEXT INDEX Company_description_ft IF NOT EXISTS FOR (x:Company) ON EACH [x.description]
2. What can you index with full-text indexes
What can you index with a full-text index? (Select all that apply.)
-
✓ A string property in each node with a given label.
-
✓ Multiple string properties across multiple node labels.
-
✓ A string property in each relationship with a given type.
-
✓ Multiple string properties across multiple relationship types.
-
❏ Multiple string properties across multiple node labels and relationship types.
Hint
A full-text indexes are created either for nodes or relationships, but not both.
Solution
The correct answers for what you can index with a full-text index:
-
A string property in each node with a given label.
-
Multiple string properties across multiple node labels.
-
A string property in each relationship with a given type.
-
Multiple string properties across multiple relationship types.
Summary
In this lesson, you learned how to create and use a full-text index. In the next Challenge, you will create and use full-text index on a relationship property.