Filtering with WHERE
In the course, Cypher Fundamentals, you learned some basic querying where you use the WHERE
clause to filter queries where you test equality.
Testing Equality
Execute this query:
MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WHERE p.name = 'Tom Hanks'
AND m.year = 2013
RETURN m.title
In this query, we:
-
Find all
:Person
nodes with the name, Tom Hanks. -
We then traverse the
:ACTED_IN
relationships to all:Movie
nodes and filter for movies with ayear
property equal to 2013. -
Return the movie titles.
We are specifying the pattern to traverse through the graph, and then filtering on what data is retrieved within that pattern.
We then return the title
property for the three movie titles that satisfy the query.
Your queries will execute faster if the graph has indexes on property values. This course does not cover creating indexes. You typically create indexes for property values that cover your application’s most important queries.
Testing Inequality
You can also test inequality of a property using the <>
predicate.
MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WHERE p.name <> 'Tom Hanks'
AND m.title = 'Captain Phillips'
RETURN p.name
This query returns the names of all actors that acted in the movie Captain Phillips where Tom Hanks is excluded. It returns the names of the three actors that satisfy the filtering criteria.
Testing less than or greater than
You can test both numbers and strings for values less than (<
) or greater than (>
) a value.
Adding the equals sign will include the specified number within the predicate.
MATCH (m:Movie) WHERE m.title = 'Toy Story'
RETURN
m.year < 1995 AS lessThan, // Less than (false)
m.year <= 1995 AS lessThanOrEqual, // Less than or equal(true)
m.year > 1995 AS moreThan, // More than (false)
m.year >= 1995 AS moreThanOrEqual // More than or equal (true)
Testing Ranges
To test for property values within a range, you can use a combination of less than and greater than.
Here we test a range of values:
MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WHERE p.name = 'Tom Hanks'
AND 2005 <= m.year <= 2010
RETURN m.title, m.released
This query returns the four movies that Tom Hanks acted in between 2005 and 2010, inclusive.
We can also use OR
to expand the filtering to return more data as follows:
MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WHERE p.name = 'Tom Hanks'
OR m.title = 'Captain Phillips'
RETURN p.name, m.title
This query returns rows containing the name and title. Each row represents an actor who acted in Captain Phillips or has the name of Tom Hanks. In this result you will see multiple rows for Tom Hanks and each movie he acted in. You will also see multiple rows for each of the actors who acted in the movie, Captain Phillips.
Later in this course you will learn how to group data that is returned so that there are not multiple rows for Tom Hanks or multiple rows for the movie, Captain Phillips.
Testing null
property values
A property for a node or relationship is null if it does not exist.
You can test the existence of a property for a node using the IS NOT NULL
predicate.
MATCH (p:Person)
WHERE p.died IS NOT NULL
AND p.born.year >= 1985
RETURN p.name, p.born, p.died
This query returns the names, born, and died properties for all people who have a value for their died property and who were born after 1985. In this graph, it returns 6 rows.
And we can test if a property exists using the IS NULL
predicate:
MATCH (p:Person)
WHERE p.died IS NULL
AND p.born.year <= 1922
RETURN p.name, p.born, p.died
This query returns all people born before 1923 who do not have a died property value. It returns 21 rows for our dataset.
Testing labels or patterns?
Depending on your data model, it may be useful to test that a node has a label. This is particularly useful when a node may have multiple labels.
You can test for a label’s existence on a node using the {alias}:{label}
syntax.
MATCH (p:Person)
WHERE p.born.year > 1960
AND p:Actor
AND p:Director
RETURN p.name, p.born, labels(p)
This query will retrieve all Person nodes with the label Actor and Director that were born after 1960.
The labels()
function returns the list of labels for a node.
It returns 163 rows.
Here is a variation of the previous query.
Rather than using the Actor or Director labels, it uses the relationship types :ACTED_IN
and :DIRECTED
to imply that the node at the other end of the relationship has the correct label:
MATCH (p:Person)-[:ACTED_IN]->(m:Movie)<-[:DIRECTED]-(p)
WHERE p.born.year > 1960
RETURN p.name, p.born, labels(p), m.title
This query retrieves all people born after 1960 (assigned to the alias p
), who also acted in a movie and directed the same movie.
This query is more specific in that the same person both directed and acted in the movie.
In the previous query, we were only looking at labels and not relationships to movies.
This query returns 134 rows, but notice that some people directed and acted in multiple movies, for example, Jodie Foster.
Discovering relationship types
A query with a pattern need not specify the relationship type in the query:
MATCH (p:Person)-[r]->(m:Movie)
WHERE p.name = 'Tom Hanks'
RETURN m.title AS movie, type(r) AS relationshipType
This query retrieves all Movie nodes that are related to Tom Hanks.
Each row returned is a movie title and the type of relationship that Tom Hanks has to that movie.
We use the type()
function to return the type of the relationship, r.
Notice that for this query, Tom Hanks has both an ACTED_IN and DIRECTED relationship to the movie, Larry Crowne.
Testing list inclusion
You can test if a value is in a list property:
MATCH (m:Movie)
WHERE "Israel" IN m.countries
RETURN m.title, m.languages, m.countries
This query returns the titles, languages, countries of all movies that have Israel in their list of countries.
Check your understanding
1. Directors with a role
We want to find all people who directed a movie in 2015 where the role
property exists on the relationship.
What code do you use?
Once you have selected your option, click the Check Results query button to continue.
MATCH (p:Person)-[r:DIRECTED]->(m:Movie)
/*select:WHERE r.role IS NOT NULL*/
AND m.year = 2015
RETURN p.name, r.role, m.title
-
❏
WHERE p.role IS NOT NULL
-
❏
WHERE p.role = ?
-
✓
WHERE r.role IS NOT NULL
-
❏
WHERE r.role = ?
Hint
If a property does not exist on a node or relationship, it will be treated as a null value.
You can check for non-null values using the IS NOT NULL
predicate.
Solution
WHERE r.role IS NOT NULL
is the correct answer. We are making sure the role property exists for the relationship.
"?" is not a valid test for existence of a property.
The p reference is for a Person node which does not have a role property.
2. List inclusion
What Cypher keyword is used to test if a value is in a list property?
-
❏ ELEMENT OF
-
❏ INCLUDES
-
✓ IN
-
❏ IS CONTAINED IN
Hint
In Cypher, you can test if a value is in a list using a single keyword.
Solution
IN
is the correct answer.
Summary
In this lesson, reviewed and learned about basic query filtering using the WHERE
clause.
In the next challenge, you will write queries to filter data.