Filtering Queries

Filtering queries

Earlier, you learned that the WHERE clause is used to tell the query engine to filter what nodes are retrieved from the graph. In this lesson you will learn about some of the ways that you can filter your queries.

You have already learned how you can test equality for properties of a node and how you can use logical expressions to further filter what you want to retrieve.

For example, this query retrieves the Person nodes and Movie nodes where the person acted in a movie that was released in 2008 or 2009:

cypher
MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WHERE m.released = 2008 OR m.released = 2009
RETURN p, m

Filtering by node labels

You have already seen this type of query. It returns the names of all people who acted in the movie, The Matrix.

cypher
MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WHERE m.title='The Matrix'
RETURN p.name

An alternative to this query is the following where we test the node labels in the WHERE clause:

cypher
MATCH (p)-[:ACTED_IN]->(m)
WHERE p:Person AND m:Movie AND m.title='The Matrix'
RETURN p.name

Both queries execute the same way, but you may want to use one style of filtering over another in your code.

Filtering using ranges

You can specify a range for filtering a query. Here we want to retrieve Person nodes of people who acted in movies released between 2000 and 2003:

cypher
MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WHERE 2000 <= m.released <= 2003
RETURN p.name, m.title, m.released

Filtering by existence of a property

Recall that by default, there is no requirement that a node or relationship has a given property. Here is an example of a query where we only want to return Movie nodes where Jack Nicholson acted in the movie, and the movie has the tagline property.

cypher
MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WHERE p.name='Jack Nicholson' AND m.tagline IS NOT NULL
RETURN m.title, m.tagline

Filtering by partial strings

Cypher has a set of string-related keywords that you can use in your WHERE clauses to test string property values. You can specify STARTS WITH, ENDS WITH, and CONTAINS.

For example, to find all actors in the graph whose first name is Michael, you would write:

cypher
MATCH (p:Person)-[:ACTED_IN]->()
WHERE p.name STARTS WITH 'Michael'
RETURN p.name

String tests are case-sensitive so you may need to use the toLower() or toUpper() functions to ensure the test yields the correct results. For example:

cypher
MATCH (p:Person)-[:ACTED_IN]->()
WHERE toLower(p.name) STARTS WITH 'michael'
RETURN p.name

Filtering by patterns in the graph

Suppose you wanted to find all people who wrote a movie but did not direct that same movie. Here is how you would perform the query:

cypher
MATCH (p:Person)-[:WROTE]->(m:Movie)
WHERE NOT exists( (p)-[:DIRECTED]->(m) )
RETURN p.name, m.title

Filtering using lists

If you have a set of values you want to test with, you can place them in a list or you can test with an existing list in the graph. A Cypher list is a comma-separated set of values within square brackets.

You can define the list in the WHERE clause. During the query, the graph engine will compare each property with the values IN the list. You can place either numeric or string values in the list, but typically, elements of the list are of the same type of data. If you are testing with a property of a string type, then all the elements of the list will be strings.

In this example, we only want to retrieve Person nodes of people born in 1965, 1970, or 1975:

cypher
MATCH (p:Person)
WHERE p.born IN [1965, 1970, 1975]
RETURN p.name, p.born

You can also compare a value to an existing list in the graph.

We know that the :ACTED_IN relationship has a property, roles that contains the list of roles an actor had in a particular movie they acted in. Here is the query we write to return the name of the actor who played Neo in the movie The Matrix:

cypher
MATCH (p:Person)-[r:ACTED_IN]->(m:Movie)
WHERE  'Neo' IN r.roles AND m.title='The Matrix'
RETURN p.name, r.roles

What properties does a node or relationship have?

The properties for a node with a given label need not be the same. One way you can discover the properties for a node is to use the keys() function. This function returns a list of all property keys for a node.

Discover the keys for the Person nodes in the graph by running this code:

cypher
MATCH (p:Person)
RETURN p.name, keys(p)

The results returned for each row include the name of the person, followed by the list of property keys for that node. If you scroll down in the result pane, you will notice that some Person nodes do not have a born property.

What properties exist in the graph?

More generally, you can run this code to return all the property keys defined in the graph.

cypher
CALL db.propertyKeys()

Note that a property key remains in the graph, once it has been defined, even if there are currently no nodes or relationships that use that property key.

Check your understanding

1. Filtering a value in a list

Suppose you want to retrieve all movies that have a released property value that is 2000, 2002, 2004, 2006, or 2008. Here is an incomplete Cypher example to return the title property values of all movies released in these years. What keyword do you specify in the WHERE clause?

Once you have selected your option, click the Check Results query button to continue.

cypher
MATCH (m:Movie)
WHERE m.released /*select:IN*/ [2000, 2002, 2004, 2006, 2008]
RETURN m.title
  • FROM

  • IN

  • CONTAINS

  • IS

Hint

You are testing if the property value is in the list

Solution

To check that a value is contained within a list, you use the IN predicate.

2. Finding people born in the seventies.

We want to write a MATCH clause to retrieve all Person nodes for people born in the seventies.

Select the WHERE clauses below that will filter this query properly:

MATCH (a:Person) RETURN a.name, a.born

  • WHERE a.born >= 1970 AND a.born < 1980

  • WHERE 1970 <= a.born < 1980

  • WHERE 1970 < a.born <= 1980

  • WHERE a.born IN [1970,1971,1972,1973,1974,1975,1976,1977,1978,1979]

Hint

You can use a range test or a list test to filter the nodes.

Solution

The following answers are the recommended ways of finding a number between two values:

WHERE a.born >= 1970 AND a.born < 1980

WHERE 1970 <= a.born < 1980

This answer is technically correct, but the above methods are more efficient:

WHERE a.born IN [1970,1971,1972,1973,1974,1975,1976,1977,1978,1979]

This answer is incorrect, as the predicate is looking for numbers greater than 1970 (eg 1971 and onwards) rather than greater than or equal to 1970 inclusive.

WHERE 1970 < a.born <= 1980

Summary

In this lesson, you learned some of the ways you can filter what is retrieved from the graph. In the next challenge, you will demonstrate your skills at filtering nodes retrieved.

Chatbot

Hi, I am an Educational Learning Assistant for Intelligent Network Exploration. You can call me E.L.A.I.N.E.

How can I help you today?