Subqueries

What is a subquery?

When you execute a series of MATCH clauses, all nodes and relationships retrieved are in memory. If the memory requirements for a set of MATCH clauses exceed the VM configured, the query will fail.

A subquery is a set of Cypher statements that execute within their own scope. A subquery is typically called from an outer enclosing query. Using a subquery, you can limit the number of rows that need to be processed.

Here are some important things to know about a subquery:

  • A subquery returns values referred to by the variables in the RETURN clause.

  • A subquery cannot return variables with the same name used in the enclosing query.

  • You must explicitly pass in variables from the enclosing query to a subquery.

Performing subqueries with CALL

In a CALL clause, you specify a query that can return data from the graph or derived from the graph. A set of nodes returned in the CALL clause can be used by the enclosing query.

Here is a simple example:

cypher
CALL {
   MATCH (m:Movie) WHERE m.year = 2000
   RETURN m ORDER BY m.imdbRating DESC LIMIT 10
}
MATCH  (:User)-[r:RATED]->(m)
RETURN m.title, avg(r.rating)

The subquery is demarcated by the {}s here. The subquery returns 10 Movie nodes of movies released in 2000 with the highest imdbRating. Then with these 10 movies, it determines the average rating for each of these 10 movies.

Passing variables into a subquery

Here is an example where the subquery is executed after the initial query and the enclosing query passes a variable, m into the subquery.

cypher
MATCH (m:Movie)
CALL {
    WITH m
    MATCH (m)<-[r:RATED]-(u:User)
     WHERE r.rating = 5
    RETURN count(u) AS numReviews
}
RETURN m.title, numReviews
ORDER BY numReviews DESC

In this query:

  1. The first MATCH returns a row for every movie, m in the graph.

  2. It passes the Movie node, m to the subquery.

  3. Then within the subquery, the query executes to find all users who gave that movie a rating of 5 and counts them.

  4. The subquery returns the count.

  5. Back in the enclosing query, the title is returned, and the count of the number of rows returned from the subquery.

Using subqueries enables you to reduce the number of rows processed in a query.

Combining query results with UNION

As your queries become more complex, you may need to combine the results of multiple queries. You can do so with UNION. With UNION, the queries you are combining must return the same number of properties or data

Here is a simple example:

cypher
MATCH (m:Movie) WHERE m.year = 2000
RETURN {type:"movies", theMovies: collect(m.title)} AS data
UNION ALL
MATCH (a:Actor) WHERE a.born.year > 2000
RETURN { type:"actors", theActors: collect(DISTINCT a.name)} AS data

The first query returns an object with a type property of "movies" and a theMovies property that is a list of movies. It returns this object as a variable named Data

The second query returns an object with a type property of "actors" and a theActors property that is a list of actor names. It returns this object as a variable named Data

Because both queries return a variable named Data, we can combine the results using UNION ALL.

UNION ALL returns all results which is more efficient on memory but can lead to duplicates. UNION returns distinct results.

Using UNION with subqueries

Results of a UNION cannot be directly post-processed. But if you wrap a UNION in a subquery, you can then further process the results.

Here is an example that uses UNION within a subquery:

cypher
MATCH (p:Person)
WITH p LIMIT 100
CALL {
  WITH p
  OPTIONAL MATCH (p)-[:ACTED_IN]->(m:Movie)
  RETURN m.title + ": " + "Actor" AS work
UNION
  WITH p
  OPTIONAL MATCH (p)-[:DIRECTED]->(m:Movie)
  RETURN m.title+ ": " +  "Director" AS work
}
RETURN p.name, collect(work)

This query:

  1. 100 Person nodes are retrieved and passed to the subquery.

  2. If that Person acted in the movie, its title with the Actor suffix is returned.

  3. The second part of the subquery does the same for the DIRECTED relationships.

  4. The work results are combined and collected.

  5. The result is the name of the person and their Actor or Director titles.

Check your understanding

1. Using a subquery

Here is a query that has a subquery. The enclosing query finds all User nodes. The subquery finds all movies that this user rated with 5 and return them. How do you complete this query so it can return the desired results?

Once you have selected your option, click the Check Results query button to continue.

cypher
MATCH (u:User)
CALL {
/*select:WITH u*/
    MATCH (m:Movie)<-[r:RATED]-(u)
     WHERE r.rating = 5
    RETURN m
}
RETURN m.title, count(m) AS numReviews
ORDER BY numReviews DESC
  • WITH u

  • USE u

  • INPUT u

  • // nothing

Hint

Subqueries can use variables from the enclosing queries, but the variables must be explicitly specified in the scope of the subquery.

Solution

The correct answer is: WITH u. You must pass the variables in if they are to be used in the subquery.

There are no Cypher clauses, USE or INPUT.

2. Combining query results

What Cypher keyword is used to combine the results of multiple MATCH clauses into a single return stream?

  • ❏ AND

  • ✓ UNION ALL

  • ❏ COMBINE

  • ❏ COLLECT

Hint

This keyword is used to separate multiple queries that return data.

Solution

The correct answer is: UNION ALL.

AND is used in WHERE clauses to filter data.

COMBINE and COLLECT are not valid Cypher keywords.

Summary

In this lesson, you learned how to call a subquery in Cypher that will help reduce rows required for your queries, as well as UNION.

In the next challenge, you will write a subquery.