Using count()
to aggregate data
With the Cypher count()
function, you can perform a count of properties, nodes, relationships, paths, or rows during query processing.
When you aggregate in a Cypher statement, the query must process all patterns in the MATCH
clause to
complete the aggregation to either return results or perform the next part of the query.
Here is an example:
MATCH (a:Person)-[:ACTED_IN]->(m:Movie)
WHERE a.name = 'Tom Hanks'
RETURN a.name AS ActorName,
count(*) AS NumMovies
This query returns the number of movies Tom Hanks acted in. count(*)
returns the number of rows retrieved in the query.
What we return to the client is the name of the actor and the count of the rows.
Here is another example:
PROFILE MATCH (a:Person)-[:ACTED_IN]->(m:Movie)<-[:DIRECTED]-(d:Person)
RETURN a.name AS ActorName,
d.name AS DirectorName,
count(*) AS NumMovies
ORDER BY NumMovies DESC
Here we are returning the actor name and director name pairs and aggregating on the number of rows for each pair. That is, the number of movies for each actor/director pair.
And if we wanted to collect the titles of the movies, it would use the same type of eager aggregation:
PROFILE MATCH (a:Person)-[:ACTED_IN]->(m:Movie)<-[:DIRECTED]-(d:Person)
RETURN a.name AS ActorName,
d.name AS DirectorName,
count(*) AS NumMovies, collect(m.title) AS Movies
ORDER BY NumMovies DESC
Using count()
on property values
You can count the number of properties to see if there are nodes that do not have a value. Here is an example:
MATCH (p:Person)
RETURN count(p) , count(p.born)
In this query, we return the number of Person nodes and the number of born properties in these nodes. We see that these numbers differ so we know that there are Person nodes that do not have the born property in the grapn.
Counting for the next part of the query
Here is an example where we start the query processing by retrieving all actors and their movies.
During the query processing, we want to only return actors that have 2 movies.
We use count()
to further filter the query.
PROFILE MATCH (a:Person)-[:ACTED_IN]->(m:Movie)
WITH a, count(*) AS NumMovies
// can add Movies to also create the list of movies
//, collect(m.title) AS Movies
WHERE NumMovies = 2
RETURN a.name AS Actor
//, Movies
Notice that when this query executes, is does an EagerAggregation for the count()
before the filter step.
Counting specific nodes
In addition to counting the number of rows returned, you can also count specific nodes retrieved in the query. In this query:
MATCH (p:Person {name: 'Elvis Presley'})-[]-(m:Movie)-[]-(a:Actor)
RETURN count(*), count(m), count (a)
All count values are the same. The total number of rows is 34. The total number of Actor nodes and Movie nodes is also 34 because the nodes are repeated in each row.
You specify DISTINCT
so that the exact number of movies and actors are counted with no duplicates:
MATCH (p:Person {name: 'Elvis Presley'})-[]-(m:Movie)-[]-(a:Actor)
RETURN count(*), count(DISTINCT m), count (DISTINCT a)
This query returns the total count of rows returned (34), the number of distinct movies (11) and the number of distinct actors (33).
Check your understanding
1. Counting rows
Suppose you have a query that retrieves Movie and Person nodes:
MATCH (p:Person)-[]-(m:Movie)
Which Cypher clause do you use to return the number of rows retrieved?
-
❏
RETURN total(*)
-
✓
RETURN count(*)
-
❏
RETURN totalRows(*)
-
❏
RETURN numRows(*)
Hint
This Cypher function counts rows retrieved in the query step.
Solution
You use the count(*)
function to return the number of rows retrieved in this step of the query.
2. Movies that are rated
We want a count of the movie titles in the graph that have been rated.
Use the dropdown below to select the correct RETURN
clause.
Once you have selected your option, click the Check Results query button to continue.
MATCH (u:User)-[:RATED]->(m:Movie)
/*select:RETURN count (DISTINCT m.title)*/
-
❏
RETURN count (*)
-
❏
RETURN count (DISTINCT *)
-
❏
RETURN count (m.title)
-
✓
RETURN count (DISTINCT m.title)
Hint
The query returns all user/movie node pairs with the RATED
relationship. Users have rated multiple movies.
How can you just count the number of unique movie titles that have been rated?
Solution
The correct answer is RETURN count (DISTINCT m.title)
Summary
In this lesson, you reviewed and learned more about aggregating data using count()
in your Cypher queries.
In the next Challenge, you will create a query to aggregate using count()
.