Full-text indexes
Full-text indexes are useful in applications that must parse property values for evaluating whether the property satisfies the criteria. Full-text indexes rely on Apache Lucene for their implementation which makes their parsing capabilities very powerful.
With a full-text index, you can use Lucene’s full-text query language to express how the values will be matched in a query. A full-text index can be defined for multiple labels and/or properties, or for multiple relationship types and/or properties.
Unlike RANGE and TEXT indexes, you must call a procedure to use a full-text index at runtime. That is, the query planner will not automatically use a full-text index unless you specify it in your Cypher code.
Why use a full-text index?
Suppose you want to find all Movies that have certain phrases in their plots.
And suppose we added a TEXT index for the plot property:
CREATE TEXT INDEX Movie_plot_text IF NOT EXISTS FOR (x:Movie) ON (x.plot)
Performing this type of retrieval using RANGE or TEXT indexes could be very expensive. For example:
PROFILE MATCH (m:Movie)
WHERE m.plot CONTAINS "murder"
AND m.plot CONTAINS "drugs"
RETURN m.title,m.plot
The default behavior in Neo4j is to use only one index for a query. A subquery can use an additional index. But for this query, the query engine needs to determine which predicate will be more efficient. The first predicate returns all "murder" rows. The second predicate returns all "drugs" rows. For our dataset, the graph engine uses the index to select all properties that contain "drugs". It can do so because it can determine using the data in the index, that the number of the rows that contain "drugs" is smaller than the number of rows that contain "murder". For those rows, the query engine, then tests the properties for "murder". That is, the index can only be used once for this query.
When using a full-text index, you can specify an expression that will find all properties that contain both strings, anywhere in them. For example, if we had a full-text index on the Movie plot property named Movie_plot_ft, we could return the nodes that have both "murder" and "drugs" in them with this code:
CALL db.index.fulltext.queryNodes
('Movie_plot_ft', 'murder AND drugs')
YIELD node
This query uses Lucene’s full-text query language to retrieve the nodes.
Another benefit of creating a full-text index is that you can specify an index on multiple properties associated with multiple labels.
Check your understanding
1. What are the advantages of full-text indexes? (Select all that apply.)
-
✓ You can define an index on multiple node labels and multiple properties for those labels.
-
✓ You can define an index on multiple relationship types and multiple properties for those types.
-
❏ The data pointed to by the index can reside outside the graph.
-
✓ You can use regular expressions for your query predicates that are quite complex.
Hint
These three advantages make full-text indexes very useful for an application.
Solution
The advantages of using full-text indexes are:
-
You can define an index on multiple node labels and multiple properties for those labels.
-
You can define an index on multiple relationship types and multiple properties for those types.
-
You can use regular expressions for your query predicates that are quite complex.
2. Full-text index implementation
What is the default underlying implementation of a full-text index in Neo4j?
-
❏ Apache SOLR
-
❏ Elasticsearch
-
❏ Typesense
-
✓ Apache Lucene
Hint
One of these search libraries is an open source project by Apache.
Solution
The correct answer is Apache Lucene
Summary
In this lesson, you learned what a full-text index is in Neo4j. In the next lesson, you will learn how to create a full-text index.