Thinking in Graphs

Why Graph Databases?

Relationships in a graph are treated with the same importance as nodes that connect them.

This is not the same as other database technologies, as a result you may have experienced issues such as:

  • Poorly performing queries when traversing complex schema of foreign keys and many-to-many relationships.

  • Dealing with hierarchical data or trees, where the answer may lie at an unknown or varying depth.

  • Finding a path through an ever-changing dataset with complex requirements.

  • Simply finding it difficult to express what you are looking for.

These issues are often as a result of storing data in tables when the requirement is to understand relationships not rows.

The O(n) problem

Storing data in rows and columns is one of the oldest storage mechanisms and existed long before the computer.

Tabular data is a tried-and-tested methodology that works well for many use cases and an ecosystem of tooling exists to help you work efficiently.

However, as the amount of data grows or the application or use case becomes more complex, you may encounter challenges dealing with relationships.

When querying across tables, the joins are computed at read-time, using an index to find the corresponding rows in the target table. The more data added to the database, the larger the index grows, the slower the response time.

This problem is known as the “Big O” or O(n) notation.

NoSQL Databases

Over the years, many No-SQL databases have sprung up to handle different scenarios, for example:

  • Document stores offer flexibility.

  • Wide-column stores offer scalability for large datasets.

  • Key-value stores provide simplicity and high performance.

  • Graph databases enable efficient modeling and querying of complex relationships.

Graphs

When you create a relationship between two nodes, the database stores a pointer to the relationship with each node. When reading data, the database will follow pointers in memory rather than relying on an underlying index.

This means that the query time remains constant to the size of the relationships expanded regardless of the overall size of the data.

A graph database yields much faster results for queries across entities and is a great fit when you need to:

  • Understand the relationships between entities - for example, how two people are connected

  • Self referenced data of the same type - for example, a hierarchy of employees within a company

  • Explore relationships of varying or unknown depth - for example, the use of parts within a factory

  • Calculate a route between two points in a network - for example, finding the most efficient route on public transport

Graph databases are a great fit for scenarios when you need to handle relationships efficiently.

Summary

In this lesson, you explored when graph databases are a good solution.

In the next lesson, you will learn how to query a graph database using Cypher.