Pipelining Queries

What does pipelining mean?

You have learned that you can use the WITH clause to redefine the scope for the query:

cypher
WITH  'Tom Hanks' AS theActor
MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WHERE p.name = theActor
WITH m  LIMIT 5
// possibly do more with the five m nodes
RETURN m.title AS movies

This query limits the query to retrieve five nodes.

You can do something with the five nodes by adding another MATCH clause:

cypher
WITH  'Tom Hanks' AS theActor
MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WHERE p.name = theActor
WITH m  LIMIT 5
MATCH (d:Person)-[:DIRECTED]->(m)
RETURN d.name AS director,
m.title AS movies

The five Movie nodes are used for the second MATCH clause and five rows are returned. The WITH clause enables us to pipeline the results of the first query into the second query. With this simple query, m need not be redefined or scoped.

Next you will learn how pipelining is really useful.

Using WITH for aggregation

A very common use of WITH is to aggregate data so that intermediate results can be used to create the final returned data or to pass the intermediate results to the next part of the query.

Here is an example:

cypher
MATCH (:Movie {title: 'Toy Story'})-[:IN_GENRE]->(g:Genre)<-[:IN_GENRE]-(m)
WHERE m.imdbRating IS NOT NULL
WITH g.name AS genre,
count(m) AS moviesInCommon,
sum(m.imdbRating) AS total
RETURN genre, moviesInCommon,
total/moviesInCommon AS score
ORDER By score DESC

In this query we are counting the number of movies that share the same Genre node. We use count() to count the number of rows and sum() to total the imdbRating for each movie for the Genre. These values are calculated as part of the aggregation and then used to return the data. In the WITH clause we pass on only the values we need to return a row for the Genre.

Here is another example that shows aggregation and pipelining:

cypher
MATCH (u:User {name: "Misty Williams"})-[r:RATED]->(:Movie)
WITH u, avg(r.rating) AS average
MATCH (u)-[r:RATED]->(m:Movie)
WHERE r.rating > average
RETURN average , m.title AS movie,
r.rating as rating
ORDER BY rating DESC

For this query, we first calculate the average rating for all movies that Misty Williams rated. Then we use this calculated value, average as a test for the second MATCH.

Using WITH for collecting

Another common use for the WITH clause is to collect results into a list that will be returned:

cypher
MATCH (m:Movie)--(a:Actor)
WHERE m.title CONTAINS 'New York'
WITH m, collect (a.name) AS actors,
count(*) AS numActors
RETURN m.title AS movieTitle, actors
ORDER BY numActors DESC

This query collects the names of actors who acted in the movies containing the string 'New York'. This aggregation collects the names and totals the number of actors.

Here is another example where we perform a 2-step aggregation for collecting a list of maps:

cypher
MATCH (m:Movie)<-[:ACTED_IN]-(a:Actor)
WHERE m.title CONTAINS 'New York'
WITH m, collect (a.name) AS actors,
count(*) AS numActors
ORDER BY numActors DESC
RETURN collect(m { .title, actors, numActors }) AS movies

Using LIMIT early

A best practice is to execute queries that minimize the number of rows processed in the query. One way to do that is to limit early in the query. This also helps in reducing the number of properties loaded from the database too early.

For example, this query limits during the aggregation:

cypher
PROFILE MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WHERE p.born.year = 1980
WITH p,
collect(m.title) AS movies LIMIT 3
RETURN p.name AS actor, movies

A better way to do this query is to limit early. This query will perform slightly better because LIMIT is moved up in the query:

cypher
PROFILE MATCH (p:Actor)
WHERE p.born.year = 1980
WITH p  LIMIT 3
MATCH (p)-[:ACTED_IN]->(m:Movie)
WITH p, collect(m.title) AS movies
RETURN p.name AS actor,  movies

Use DISTINCT when necessary

Here is an example of a query that retrieves three actors, then collects the names of the genres for the movies that actor acted in.

cypher
MATCH (p:Actor)
WHERE p.born.year = 1980
WITH p  LIMIT 3
MATCH (p)-[:ACTED_IN]->(m:Movie)-[:IN_GENRE]->(g:Genre)
WITH p, collect(g.name) AS genres
RETURN p.name AS actor, genres

Notice that for this query, the collected genre names are repeated. You would want to ensure that they are not duplicated as follows:

cypher
MATCH (p:Actor)
WHERE p.born.year = 1980
WITH p  LIMIT 3
MATCH (p)-[:ACTED_IN]->(m:Movie)-[:IN_GENRE]->(g:Genre)
WITH p, collect(DISTINCT g.name) AS genres
RETURN p.name AS actor, genres

Check your understanding

Limiting results

Here is a query to return the names of Directors who directed movies that Keanu Reeves acted in. How do you specify that you want to limit the number of rows returned to 3?

Once you have selected your option, click the Check Results query button to continue.

cypher
MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WHERE p.name = 'Keanu Reeves'
/*select:WITH m  LIMIT 3*/
MATCH (d:Person)-[:DIRECTED]->(m)
RETURN collect(d.name) AS directors,
m.title AS movies
  • WITH LIMIT 3

  • WITH m LIMIT 3

  • AND count(m) = 3

  • LIMIT 3

Hint

The WITH clause must carry forward the m variable and limit how many movie nodes are carried forward.

Solution

The correct answer is: WITH m LIMIT 3

The variable, m must be carried forward so it can be used in the RETURN clause.

2. Cypher for aggregating

What Cypher functions/keywords are used to aggregate intermediate results?

  • LIMIT

  • count()

  • collect()

  • DISTINCT

Hint

These two functions aggregate data in the query.

Solution

The correct answers are: count() and collect()

LIMIT is used to limit the number of rows processed, but it does not aggregate data.

DISTINCT is used to ensure that a row or property value is not repeated in the result.

Summary

In this lesson, you learned how WITH is used to create intermediate results that can be used for pipelining queries.

In the next challenge, you will write a query that uses WITH to provide intermediate results for further query processing.