Agregating Data
You can aggregate data in Neo4j by using a function as part of the RETURN
clause.
This allows you to perform calculations on your data, such as counting, summing, or averaging values.
Count
With the Cypher count()
function, you can perform a count of properties, nodes, relationships, paths, or rows during query processing.
This query counts the number of movies in the database:
MATCH (m:Movie)
RETURN count(m)
Count
Returning multiple values that contain a count will return a single row with the count of each value.
This query returns the number of movies in each genre:
MATCH (m:Movie)-[:IN_GENRE]->(g:Genre)
RETURN g.name AS genre, count(m) AS numMovies
Count
You can order the results on aggregated values.
MATCH (m:Movie)-[:IN_GENRE]->(g:Genre)
RETURN g.name AS genre, count(m) AS numMovies
ORDER BY numMovies DESC
Sum
You can sum values using the sum()
function.
What is the total revenue of all movies in the database?
MATCH (m:Movie)
RETURN sum(m.revenue)
Sum
This query finds the most profitable actors by grouping movie revenue by actor:
MATCH (m:Movie)<-[:ACTED_IN]-(p:Person)
RETURN
p.name AS actor,
sum(m.revenue) AS totalRevenue
ORDER BY totalRevenue DESC
LIMIT 10
Average
You can use avg()
to find the average value of a property.
Find the average rating of all movies in the database:
MATCH (m:Movie)<-[r:RATED]-()
RETURN
m.title AS movieTitle,
avg(r.rating) AS avgRating
Average
Adding an ORDER BY
and LIMIT
clause will return the top 10 movies:
MATCH (m:Movie)<-[r:RATED]-()
RETURN
m.title AS movieTitle,
avg(r.rating) AS avgRating
ORDER BY avgRating DESC
LIMIT 10
Challenges
Complete the queries to answer the following questions:
-
What movie has the most actors?
cypherMATCH (m:Movie)-[:ACTED_IN]-(p:Person) RETURN m.title AS movieTitle, ?????(??????) as actorCount ORDER BY ?????? ????
-
What is the highest rated movie staring "Tom Hanks".
cypherMATCH (p:Person {name: "Tom Hanks"})-[:ACTED_IN]-(m:Movie) MATCH (m)<-[r:??????]-(u:User) RETURN m.title AS movieTitle, ???(r.??????) AS avgRating ORDER BY ?????? DESC
Click to reveal the answers
-
What movie has the most actors?
cypherMATCH (m:Movie)-[:ACTED_IN]-(p:Person) RETURN m.title AS movieTitle, count(p) as actorCount ORDER by actorCount DESC
-
Find the highest rated movie staring "Tom Hanks".
cypherMATCH (p:Person {name: "Tom Hanks"})-[:ACTED_IN]-(m:Movie) MATCH (m)<-[r:RATED]-(u:User) RETURN m.title AS movieTitle, avg(r.rating) AS avgRating ORDER BY avgRating DESC
Next
Summary
In this lesson, you learned how to aggregate data using the count()
, sum()
, and avg()
functions.