Thinking in Graphs

You have learned that Graph databases store data in a structure of nodes and relationships.

In this lesson, you will learn why a graph database can be the right solution.

Why Graph Databases?

Relationships in a graph are treated with the same importance as nodes that connect them.

This is not the same as other database technologies, as a result you may have experienced issues such as:

  • Poorly performing queries when traversing complex schema of foreign keys and many-to-many relationships.

  • Dealing with hierarchical data or trees, where the answer may lie at an unknown or varying depth.

  • Finding a path through an ever-changing dataset with complex requirements.

  • Simply finding it difficult to express what you are looking for.

These issues are often as a result of storing data in tables when the requirement is to understand relationships not rows.

The O(n) problem

Storing data in rows and columns is one of the oldest storage mechanisms and existed long before the computer.

Tabular data is a tried-and-tested methodology that works well for many use cases and an ecosystem of tooling exists to help you work efficiently.

However, as the amount of data grows or the application or use case becomes more complex, you may encounter challenges dealing with relationships.

When querying across tables, the joins are computed at read-time, using an index to find the corresponding rows in the target table. The more data added to the database, the larger the index grows, the slower the response time.

This problem is known as the “Big O” or O(n) notation.

NoSQL Databases

Over the years, many No-SQL databases have sprung up to handle different scenarios, for example:

  • Document stores offer flexibility.

  • Wide-column stores offer scalability for large datasets.

  • Key-value stores provide simplicity and high performance.

  • Graph databases enable efficient modeling and querying of complex relationships.

Graphs

When you create a relationship between two nodes, the database stores a pointer to the relationship with each node. When reading data, the database will follow pointers in memory rather than relying on an underlying index.

This means that the query time remains constant to the size of the relationships expanded regardless of the overall size of the data.

A graph database yields much faster results for queries across entities and is a great fit when you need to:

  • Understand the relationships between entities - for example, how two people are connected

  • Self referenced data of the same type - for example, a hierarchy of employees within a company

  • Explore relationships of varying or unknown depth - for example, the use of parts within a factory

  • Calculate a route between two points in a network - for example, finding the most efficient route on public transport

Graph databases are a great fit for scenarios when you need to handle relationships efficiently.

Check your understanding

The 0(n) notation problem

Related to relational databases - what is the O(n) problem?

  • ❏ The challenge of maintaining data integrity during database updates.

  • ✓ Increasing response times as database indexes grow larger due to more data.

  • ❏ Difficulty in implementing security protocols in relational databases.

  • ❏ The need to frequently back up data to prevent loss.

Hint

When querying across tables in a relational database, the joins are computed at read-time, using an underlying index to find the corresponding rows in the target table.

Solution

The answer is "Increasing response times as database indexes grow larger due to more data".

Graph databases negate this issue by as the database stores a pointer to the relationship with each node. So when the time comes to read the data, the database will follow pointers in memory rather than relying on an underlying index.

Summary

In this lesson, you explored when graph databases are a good solution.

In the next lesson, you will learn explore typical use cases of a graph database.