How GDS Works

Introduction

At a high-level, GDS works by transforming and loading data into an in-memory format that is optimized for high-performance graph analytics.

GDS provides graph algorithms, feature engineering, and machine learning methods to execute on this in-memory graph format.

This enables the efficient and scalable application of data science to large graphs including representations of entire graph databases or large portions of them.

In this lesson you will learn the high-level workflow in GDS.

General Workflow

The GDS workflow can be broken down into 3 high-level steps:

Read and load the graph

GDS reads data, transforms it, and loads it into an in-memory graph. This process is referred to as projecting a graph and the in-memory graph is called a graph projection.

GDS can hold multiple graph projections at once and they are managed by the Graph Catalog. You will learn more about the catalog and graph projection management in the next module.
Execute Algorithms

This includes classic graph algorithms such as centrality, community detection, path finding, etc.

You can also use embeddings, a form of robust graph feature engineering, as well as machine learning pipelines.
Store Results

There are a few things you may want to do with the output/result of graph algorithms.

GDS enables you to write results back to the database, export to disk in csv format, or stream results into another application or downstream workflow.

The following diagram illustrates how the high-level workflow fits with the GDS architecture.

GDS High-Level Workflow, showing how the 3 steps read, transform, and store data.

Check your understanding

1. GDS Workflow

GDS transforms and loads data into:

❏ a graph OLAP cube
✓ an in-memory graph format
❏ a separate analytical view stored on disk with the database
❏ a Python process

Hint

Performing graph algorithms in-memory offers several benefits, including faster computation and analysis due to reduced I/O overhead, improved scalability and efficiency in handling large graph datasets, and the ability to iterate and explore graph structures dynamically without the need for persistent storage operations.

Solution

The answer is an in-memory graph format.

Summary

In this lesson you learned about the high-level workflow in GDS.

In the next module you will learn about graph management, the graph catalog, and working with graph projections in more detail.

Introduction to Neo4j Graph Data Science

Neo4j GDS Overview

Graph Management