Filtering queries
Earlier, you learned that the WHERE
clause is used to tell the query engine to filter what nodes are retrieved from the graph.
In this lesson you will learn about some of the ways that you can filter your queries.
You have already learned how you can test equality for properties of a node and how you can use logical expressions to further filter what you want to retrieve.
For example, this query retrieves the Person nodes and Movie nodes where the person acted in a movie that was released in 2008 or 2009:
MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WHERE m.released = 2008 OR m.released = 2009
RETURN p, m
Filtering by node labels
You have already seen this type of query. It returns the names of all people who acted in the movie, The Matrix.
MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WHERE m.title='The Matrix'
RETURN p.name
An alternative to this query is the following where we test the node labels in the WHERE
clause:
MATCH (p)-[:ACTED_IN]->(m)
WHERE p:Person AND m:Movie AND m.title='The Matrix'
RETURN p.name
Both queries execute the same way, but you may want to use one style of filtering over another in your code.
Filtering using ranges
You can specify a range for filtering a query. Here we want to retrieve Person nodes of people who acted in movies released between 2000 and 2003:
MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WHERE 2000 <= m.released <= 2003
RETURN p.name, m.title, m.released
Filtering by existence of a property
Recall that by default, there is no requirement that a node or relationship has a given property. Here is an example of a query where we only want to return Movie nodes where Jack Nicholson acted in the movie, and the movie has the tagline property.
MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WHERE p.name='Jack Nicholson' AND m.tagline IS NOT NULL
RETURN m.title, m.tagline
Filtering by partial strings
Cypher has a set of string-related keywords that you can use in your WHERE
clauses to test string property values.
You can specify STARTS WITH
, ENDS WITH
, and CONTAINS
.
For example, to find all actors in the graph whose first name is Michael, you would write:
MATCH (p:Person)-[:ACTED_IN]->()
WHERE p.name STARTS WITH 'Michael'
RETURN p.name
String tests are case-sensitive so you may need to use the toLower()
or toUpper()
functions to ensure the test yields the correct results.
For example:
MATCH (p:Person)-[:ACTED_IN]->()
WHERE toLower(p.name) STARTS WITH 'michael'
RETURN p.name
Filtering by patterns in the graph
Suppose you wanted to find all people who wrote a movie but did not direct that same movie. Here is how you would perform the query:
MATCH (p:Person)-[:WROTE]->(m:Movie)
WHERE NOT exists( (p)-[:DIRECTED]->(m) )
RETURN p.name, m.title
Filtering using lists
If you have a set of values you want to test with, you can place them in a list or you can test with an existing list in the graph. A Cypher list is a comma-separated set of values within square brackets.
You can define the list in the WHERE
clause.
During the query, the graph engine will compare each property with the values IN
the list.
You can place either numeric or string values in the list, but typically, elements of the list are of the same type of data.
If you are testing with a property of a string type, then all the elements of the list will be strings.
In this example, we only want to retrieve Person nodes of people born in 1965, 1970, or 1975:
MATCH (p:Person)
WHERE p.born IN [1965, 1970, 1975]
RETURN p.name, p.born
You can also compare a value to an existing list in the graph.
We know that the :ACTED_IN relationship has a property, roles that contains the list of roles an actor had in a particular movie they acted in. Here is the query we write to return the name of the actor who played Neo in the movie The Matrix:
MATCH (p:Person)-[r:ACTED_IN]->(m:Movie)
WHERE 'Neo' IN r.roles AND m.title='The Matrix'
RETURN p.name, r.roles
What properties does a node or relationship have?
The properties for a node with a given label need not be the same.
One way you can discover the properties for a node is to use the keys()
function.
This function returns a list of all property keys for a node.
Discover the keys for the Person nodes in the graph by running this code:
MATCH (p:Person)
RETURN p.name, keys(p)
The results returned for each row include the name of the person, followed by the list of property keys for that node. If you scroll down in the result pane, you will notice that some Person nodes do not have a born property.
What properties exist in the graph?
More generally, you can run this code to return all the property keys defined in the graph.
CALL db.propertyKeys()
Note that a property key remains in the graph, once it has been defined, even if there are currently no nodes or relationships that use that property key.
Check your understanding
1. Filtering a value in a list
Suppose you want to retrieve all movies that have a released property value that is 2000, 2002, 2004, 2006, or 2008. Here is an incomplete Cypher example to return the title property values of all movies released in these years.
What keyword do you specify in the WHERE
clause?
Once you have selected your option, click the Check Results query button to continue.
MATCH (m:Movie)
WHERE m.released /*select:IN*/ [2000, 2002, 2004, 2006, 2008]
RETURN m.title
-
❏
FROM
-
✓
IN
-
❏
CONTAINS
-
❏
IS
Hint
You are testing if the property value is in the list
Solution
To check that a value is contained within a list, you use the IN
predicate.
2. Finding people born in the seventies.
We want to write a MATCH
clause to retrieve all Person nodes for people born in the seventies.
Select the WHERE
clauses below that will filter this query properly:
MATCH (a:Person) RETURN a.name, a.born
-
✓
WHERE a.born >= 1970 AND a.born < 1980
-
✓
WHERE 1970 <= a.born < 1980
-
❏
WHERE 1970 < a.born <= 1980
-
✓
WHERE a.born IN [1970,1971,1972,1973,1974,1975,1976,1977,1978,1979]
Hint
You can use a range test or a list test to filter the nodes.
Solution
The following answers are the recommended ways of finding a number between two values:
WHERE a.born >= 1970 AND a.born < 1980
WHERE 1970 <= a.born < 1980
This answer is technically correct, but the above methods are more efficient:
WHERE a.born IN [1970,1971,1972,1973,1974,1975,1976,1977,1978,1979]
This answer is incorrect, as the predicate is looking for numbers greater than 1970 (eg 1971 and onwards) rather than greater than or equal to 1970 inclusive.
WHERE 1970 < a.born <= 1980
Summary
In this lesson, you learned some of the ways you can filter what is retrieved from the graph. In the next challenge, you will demonstrate your skills at filtering nodes retrieved.