Shortest Paths with Cypher

Introduction

Imagine that you are asked to build a web application that will allow users to find the shortest, or cheapest route between two airports. The user selects an origin and destination airport, and it is the responsibility of the application to suggest routes between them.

As you might imagine, finding the best connections between various airports is a graph problem in a nutshell. You could easily represent airports as nodes, and use relationships to model the possible relationships between them.

The Cypher query language was designed to efficiently match patterns in a graph using a declarative syntax. Writing a Cypher query to identify possible routes between two airports becomes a trivial task compared to other database query languages.

Unweighted Shortest Path

Cypher supports the calculation of the shortest unweighted path between a pair of nodes with the shortestPath() function.

In an unweighted path, the traversal of each relationship has an identical cost, so the shortest path between two nodes will always be the sum of the total relationships in a path between them.

How it works

The shortestPath() function expects a definition of a Cypher path between an origin and destination node. The calculation is then made by traversing through the graph from both nodes, finding where the two paths meet in the middle. The traversal is breadth first, meaning that all relationships from a node will be followed before traversing further through the graph.

The state of the traversal is held in memory, pruning or removing paths that are longer than the current shortest path as it goes. The overall shortest path is calculated as the shortest path length between the two nodes, or in other words, the smallest number of relationships in the path.

The output of the function is the single shortest path between the two nodes.

An example

Say we would like to find the shortest route between Baltimore and Frankfurt, we could run the following query.

Cypher
Finding the Shortest Path
MATCH (source:Airport {city:"Baltimore"}),
      (target:Airport {city:"Frankfurt"}) //(1)
MATCH p = shortestPath((source)-[:HAS_ROUTE*1..10]->(target)) //<2>, (3)
RETURN p

Notes:

  1. We start off by using the MATCH clause to find the source and destination (:Airport) nodes by the city property.

  2. The shortestPath() function calculates and returns a Path object.

  3. The *1..10 portion of the statement specifies that the path should contain between 1 and 10 relationships.

a graph showing a path between Baltimore and Paris nodes via Toronto with HAS_ROUTE relationships between the nodes

Limiting the number of relationships

It is good practice to specify a range of relationships (i.e. [:RELATIONSHIP*1..10]). This ensures that excessively long relationships are not queried or returned - improving query performance and constraining the result.

You can include any number of relationships by omitting the range and using [:RELATIONSHIP*].

Multiple Relationship Types

The pattern defined in the query above instructs the algorithm to follow any [:HAS_ROUTE] relationship type in an outgoing direction from the source node until it reaches the target.

You can further extend the types of relationships by using the pipe operator (|) to add additional relationship types. For example, the below statement will follow either [:HAS_ROUTE], [:CAN_WALK_TO] or [:CAN_SWIM_TO] relationships until it reaches the destination.

Cypher
More Complex Patterns
MATCH (source:Airport {city:"Baltimore"}),
      (target:Airport {city:"Frankfurt"})
MATCH p=shortestPath((source)-[:HAS_ROUTE|CAN_WALK_TO|CAN_SWIM_TO*1..10]->(target))
RETURN p

Finding All Shortest Paths

The allShortestPaths() function works in the same way as the shortestPath() function, but instead of returning the first shortest path, the function will return all paths that contain the shortest number of relationships.

Cypher
All Shortest Paths
MATCH (source:Airport {city:"Baltimore"}),
      (target:Airport {city:"Frankfurt"})
MATCH p = allShortestPaths((source)-[:HAS_ROUTE*1..10]->(target))
RETURN p

There are a lot of paths between Baltimore and Frankfurt!

a graph showing all the shortest paths between Baltimore and Frankfurt. There are so many relationships its difficult to see any detail.

Check your understanding

1. Which function can be used to return a single shortest unweighted path between two nodes?

  • shortestPath()

  • sum()

  • allShortestPaths()

  • reduce()

Hint

Your objective is to find the shortest path between two nodes.

Solution

The answer is shortestPath().

2. What type of object does the shortestPath() function return?

  • Node

  • Relationship

  • Scalar

  • Path

Hint

The shortestPath() function finds the shortest path between two nodes and represents them as a set of relationships.

Solution

The answer is Path.

3. Will the allShortestPaths() function return paths of varying lengths?

  • ❏ Yes

  • ✓ No

Hint

The allShortestPath() function returns as any path with the same length as the shortest path.

Solution

The answer is No, the allShortestPath() function returns as any path with the same length as the shortest path.

Summary

In this lesson you learned how to find the shortest unweighted paths using the built-in Cypher functions.

In the next challenge, you will use the knowledge obtained in this lesson to find the length of the shortest paths between two airports.