Aggregation

Agregating Data

You can aggregate data in Neo4j by using a function as part of the RETURN clause.

This allows you to perform calculations on your data, such as counting, summing, or averaging values.

Count

With the Cypher count() function, you can perform a count of properties, nodes, relationships, paths, or rows during query processing.

This query counts the number of movies in the database:

cypher
MATCH (m:Movie)
RETURN count(m)

Count

Returning multiple values that contain a count will return a single row with the count of each value.

This query returns the number of movies in each genre:

cypher
MATCH (m:Movie)-[:IN_GENRE]->(g:Genre)
RETURN g.name AS genre, count(m) AS numMovies

Count

You can order the results on aggregated values.

cypher
MATCH (m:Movie)-[:IN_GENRE]->(g:Genre)
RETURN g.name AS genre, count(m) AS numMovies
ORDER BY numMovies DESC

Sum

You can sum values using the sum() function.

What is the total revenue of all movies in the database?

cypher
MATCH (m:Movie)
RETURN sum(m.revenue)

Sum

This query finds the most profitable actors by grouping movie revenue by actor:

cypher
MATCH (m:Movie)<-[:ACTED_IN]-(p:Person)
RETURN
    p.name AS actor,
    sum(m.revenue) AS totalRevenue
ORDER BY totalRevenue DESC
LIMIT 10

Average

You can use avg() to find the average value of a property.

Find the average rating of all movies in the database:

cypher
MATCH (m:Movie)<-[r:RATED]-()
RETURN
    m.title AS movieTitle,
    avg(r.rating) AS avgRating

Average

Adding an ORDER BY and LIMIT clause will return the top 10 movies:

cypher
MATCH (m:Movie)<-[r:RATED]-()
RETURN
    m.title AS movieTitle,
    avg(r.rating) AS avgRating
ORDER BY avgRating DESC
LIMIT 10

Challenges

Complete the queries to answer the following questions:

  1. What movie has the most actors?

    cypher
    MATCH (m:Movie)-[:ACTED_IN]-(p:Person)
    RETURN
        m.title AS movieTitle,
        ?????(??????) as actorCount
    ORDER BY ?????? ????
  2. What is the highest rated movie staring "Tom Hanks".

    cypher
    MATCH (p:Person {name: "Tom Hanks"})-[:ACTED_IN]-(m:Movie)
    MATCH (m)<-[r:??????]-(u:User)
    RETURN
        m.title AS movieTitle,
        ???(r.??????) AS avgRating
    ORDER BY ?????? DESC
Click to reveal the answers
  1. What movie has the most actors?

    cypher
    MATCH (m:Movie)-[:ACTED_IN]-(p:Person)
    RETURN
        m.title AS movieTitle,
        count(p) as actorCount
    ORDER by actorCount DESC
  2. Find the highest rated movie staring "Tom Hanks".

    cypher
    MATCH (p:Person {name: "Tom Hanks"})-[:ACTED_IN]-(m:Movie)
    MATCH (m)<-[r:RATED]-(u:User)
    RETURN
        m.title AS movieTitle,
        avg(r.rating) AS avgRating
    ORDER BY avgRating DESC

Summary

In this lesson, you learned how to aggregate data using the count(), sum(), and avg() functions.