Limiting results
Although you can filter queries to reduce the number of results returned, you may also want to limit the number of results returned.
This is useful if you have very large result sets and you only need to see the beginning or end of a set of ordered results.
You can use the LIMIT
keyword to specify the number of results returned.
MATCH (m:Movie)
WHERE m.released IS NOT NULL
RETURN m.title AS title,
m.released AS releaseDate
ORDER BY m.released DESC LIMIT 100
In this query we filter out movie nodes that have no value for the released property. Then we return the 100 most-recently released movies in our graph.
Or, we may want to determine the youngest person in the graph:
MATCH (p:Person) WHERE
p.born IS NOT NULL
RETURN p.name as name,
p.born AS birthDate
ORDER BY p.born DESC LIMIT 1
Skipping some results
In an ordered result set, you may want to control what results are returned. This is useful in an application where pagination is required.
In this query we are returning the names of people born in 1980 ordered by their birth date.
MATCH (p:Person)
WHERE p.born.year = 1980
RETURN p.name as name,
p.born AS birthDate
ORDER BY p.born
You can add a SKIP
and LIMIT
keyword to control what page of results are returned.
MATCH (p:Person)
WHERE p.born.year = 1980
RETURN p.name as name,
p.born AS birthDate
ORDER BY p.born SKIP 40 LIMIT 10
In this query, we return 10 rows representing page 5, where each page contains 10 rows.
Eliminating duplicate records
You have seen a number of query results where there is duplication in the results returned.
In some cases, you may want to eliminate duplicated results.
You do so by using the DISTINCT
keyword.
Here is a simple example where duplicate data is returned.
MATCH (p:Person)-[:DIRECTED | ACTED_IN]->(m:Movie)
WHERE p.name = 'Tom Hanks'
RETURN m.title, m.released
ORDER BY m.title
Tom Hanks both acted in and directed the movie, Larry Crowne, so the movie is returned twice in the result stream.
We can eliminate the duplication by specifying the DISTINCT
keyword as follows:
MATCH (p:Person)-[:DIRECTED | ACTED_IN]->(m:Movie)
WHERE p.name = 'Tom Hanks'
RETURN DISTINCT m.title, m.released
ORDER BY m.title
Using DISTINCT
in the RETURN
clause here means that rows with identical values will not be returned.
Uses of DISTINCT
You can use DISTINCT
to eliminate duplication of:
-
rows returned (you have just learned this)
-
property values
-
nodes
Here is an example where we eliminate duplicate property values:
MATCH (m:Movie)
RETURN DISTINCT m.year
ORDER BY m.year
In the above query, only a single value will be returned for each Movie year.
If you were to not use DISTINCT
, all Movie year values would be returned.
And here is an example where we eliminate duplicate nodes:
MATCH (p:Person)-[:DIRECTED | ACTED_IN]->(m:Movie)
WHERE p.name = 'Tom Hanks'
RETURN DISTINCT m
If we do not specify DISTINCT
in the above query, the query returns a duplicate movie node.
Check your understanding
1. What movies have reviews?
We want to return the movies that have been reviewed.
How would you complete this query so that duplicate movie titles are not returned?
Once you have selected your option, click the Check Results query button to continue.
MATCH (m:Movie)<-[:RATED]-()
/*select:RETURN DISTINCT m.title*/
-
❏
RETURN UNIQUE m.title
-
✓
RETURN DISTINCT m.title
-
❏
RETURN WITH DISTINCT m.title
-
❏
RETURN WITH UNIQUE m.title
Hint
Another way of describing a list without duplicates is a distinct list.
Solution
Including RETURN DISTINCT m.title
will return a distinct list of title
property for the m
node.
2. Reducing data returned
Why would you want to use LIMIT
in a RETURN
clause?
-
✓ To reduce the amount of data returned to the client.
-
❏ To return only some properties of nodes.
-
[] To determine to highest value for a property in a query.
-
[] To determine the lowest value for a property in a query.
Hint
You would use LIMIT
and RETURN
to control the data returned from your Neo4j database.
Solution
You could use limit to *reduce the amount of data returned to the client.
Summary
In this lesson, you learned how you can limit results returned, count results returned, and eliminate duplication im the rows returned.
In the next challenges, you will write queries to limit, count, or reduce duplication.