In the Neo4j Fundamentals course, we covered the Seven Bridges problem, and how Swiss Mathematician Leonhard Euler devised graph theory to (dis)prove whether a particular path exists through the bridges of Königsberg.
Euler’s takeaway was that, using Graph Theory, he was able to prove that there were no possible routes through the city that crossed each bridge only once.
Finding shortest paths
Essentially, graph theory was invented to deal with finding various paths through a network of vertexes and edges.
Imagine you are building a navigation system and want to recommend the quickest route between cities to users.
The image above visualizes an example transportation network between cities in Italy, Austria, and Slovenia. The cities are represented as nodes, while the transportation modes are represented as relationships between cities.
The image demonstrates that you could bike from Salzburg to Munich in about 440 minutes or take the train from Ljubljana to Bologna for 320 minutes.
A graph-based representation of data is perfect fit for calculating the most optimal route based on your specifications. The time property attributed to each relationship in this example could be used to calculate the shortest path from origin to destination in terms of time, but additional properties could be added to enable the route to be optimized by distance travelled or cost.
Graphs vs Relational Databases
This problem would cause a headache for any relational database developer. To solve this problem in a relational database, you would need to hypothesize the order in which you would join the tables and build a complex SQL query or stored procedures to calculate the fastest route.
Another problem you might face with traditional databases is that you don’t know beforehand how many relationships you must traverse to get from node A to node B. Not knowing beforehand precisely which and how many relationships you must traverse could lead to potentially complex and computationally expensive queries.
On the other hand, graph databases are designed to solve and find complex shortest paths with a single line or two of code.
Performance Considerations
Not only will your queries be simpler and easier to maintain, but they will almost certainly be more performant.
The Neo4j Fundamentals course also covers the Graph Native Advantage and the concept of Index-free Adjacency. This concept is pertinent when calculating shortest paths.
When querying a relational database, any joins are calculated at runtime by scanning indexes. Not only is this a computationally expensive, but the speed of the query is also proportional to the total size of the data. The more information added to the database, the slower your queries will be.
In Neo4j, the data is stored in such a way that every node is aware of every incoming and outgoing relationship, and traversals through the network are instead performed by pointer chasing in memory.
So if you trying to find the optimal route through a network, it makes sense to run these calculations on top of a graph. At the same time, more straightforward queries will help you develop and maintain the platform more efficiently.
Potential Use Cases
Finding optimal routes can be applied on the following scenarios:
-
Logistics and routing
-
Infrastructure management
-
Finding optimal paths to make new contacts
-
Web link analysis
-
And more…
Check your understanding
1. Which databases are designed to solve the shortest path problem?
-
❏ key-value stores
-
❏ relational databases
-
✓ graph databases
-
❏ web application
Hint
Neo4j is a graph database.
Solution
The answer is graph databases.
2. Which of the following problems can be solved using a shortest path algorithm?
-
✓ Logistics and routing
-
❏ Finding average values
-
✓ Infrastructure management
-
✓ Web link analysis
Hint
These three use cases are well-suited to shortest-path analysis.
Solution
The correct answers are Logistics and routing, Infrastructure management and Web link analysis.
Summary
In this lesson we covered high-level overview of advantages of using a graph database to solve the shortest path problem.
In the next module we will cover how to find shortest paths using the Cypher query language.