Graph Catalog

What is the Graph Catalog?

The graph catalog is a concept that allows you to manage graph projections in GDS. This includes

creating graphs by projecting, filtering or sampling
viewing details about graphs
dropping graph projections
exporting graph projections
writing graph projection properties back to the database

How the Graph Catalog Works

You can call graph catalog operations with commands of the form

Partial

CALL gds.graph.<command>

For example, we can list the graph projections that currently exist in our database with the below command.

cypher

CALL gds.graph.list()

This will return an empty list since we haven’t created any projections yet.

Creating a Graph Projection

During this module, you will be using a movie recommendations dataset that contains information about movies, actors, and users who have rated movies.

The recommendations data model showing Movie

You can create a projection from the Actor and Movie nodes and the ACTED_IN relationship with the below command.

cypher

CALL gds.graph.project(
  'my-graph-projection',
  ['Actor','Movie'],
  'ACTED_IN'
  )

If we now list graphs again we should see information on the graph we just made:

cypher

CALL gds.graph.list()
YIELD graphName, nodeCount, relationshipCount, schema

"graphName" "nodeCount" "relationshipCount" "schema"

"graphName"	"nodeCount"	"relationshipCount"	"schema"
"my-graph-projection"	24568	35910	`{"relationships":{"ACTED_IN":{}},"nodes":{"Movie":{},"Actor":{}}}`

"my-graph-projection"

24568

35910

{"relationships":{"ACTED_IN":{}},"nodes":{"Movie":{},"Actor":{}}}

Running Algorithms

As mentioned in previous lessons, the purpose of creating a projection is to provide a space for running graph algorithms and doing graph data science efficiently.

As a simple example of a graph algorithm, we will run degree centrality on Actor nodes. We will go over the degree centrality algorithm and execution modes in the Neo4j Graph Data Science Fundamentals Course. For now, just know that this will count the number of movies each actor was in and store it on a node property called numberOfMoviesActedIn inside the projection (it will not be written back to the database yet).

cypher

CALL gds.degree.mutate('my-graph-projection', {mutateProperty:'numberOfMoviesActedIn'})

Streaming and Writing Node Properties

There will be times when we want to take the results from our algorithm calculations and either stream them into another process or write them back to the database. The graph catalog has methods to stream and write both node properties and relationship properties for these purposes. We will go over this for the case of node properties below.

Using our numberOfMoviesActedIn example, we can stream the top 10 most prolific actors by movie count using the nodeProperty.stream graph catalog operation.

cypher

CALL gds.graph.nodeProperty.stream(
  'my-graph-projection',
  'numberOfMoviesActedIn'
  )
YIELD nodeId, propertyValue
RETURN
  gds.util.asNode(nodeId).name AS actorName,
  propertyValue AS numberOfMoviesActedIn
ORDER BY numberOfMoviesActedIn DESCENDING, actorName LIMIT 10

If we instead wanted to write the property back to the database we could use the nodeProperties.write operation.

cypher

CALL gds.graph.nodeProperties.write(
  'my-graph-projection',
  ['numberOfMoviesActedIn'],
  ['Actor']
  )

We could then query the top 10 most prolific actors by movie count with Cypher.

cypher

MATCH (a:Actor)
RETURN a.name, a.numberOfMoviesActedIn
ORDER BY a.numberOfMoviesActedIn DESCENDING, a.name LIMIT 10

Exporting Graphs

In a data science workflow, you may encounter situations where you need to bulk export data from a graph projection after performing graph algorithms and other analytics. For example, you may want to:

export graph features for training a machine learning model in another environment
create separate analytical views for downstream analytics and/or sharing with colleagues.
produce snapshots of analytical results and persist to the filesystem

The graph catalog has two methods for export:

gds.graph.export to export a graph into a new database - effectively copying the projection into a separate Neo4j database
gds.beta.graph.export.csv to export a graph to csv files

Dropping Graphs

Projected graphs take up space in memory so once we are done working with a graph projection it is smart to remove it. We can do this with the drop command below:

cypher

CALL gds.graph.drop('my-graph-projection')

Now when we list graphs it will be empty again.

cypher

CALL gds.graph.list()

Other Graph Catalog Operations

There are a few other management operations in the graph catalog that we will not be going over in detail inside this module such as filtering projections, delete, and remove operations. You can read about all of them in our Graph Catalog documentation.

Check your understanding

1. Creating a Graph Projection

Which graph catalog operation can you use for creating a graph projection?

❏ CALL gds.graph.createGraph
❏ CALL gds.graph.proj
✓ CALL gds.graph.project
❏ CALL gds.graph.createProjection

Hint

You would call this procedure to project a graph.

Solution

The answer is CALL gds.graph.project.

2. Exporting to CSV

What graph catalog operation can you use to export a projection to csv files?

❏ gds.graph.csv.projectionExport
❏ gds.graph.csv.export
❏ gds.graph.export
✓ gds.graph.export.csv

Hint

In the Exporting Graphs section, it explains that you can export your projection into a new graph using gds.graph.export and use the gds.beta.graph.export.csv procedure to export to CSV.

Solution

The answer is gds.graph.export.csv.

3. Saving Properties in the Database

Suppose you have a need to calculate the number of reviews made by each User node and save it as a property in the database so it can be queried later with Cypher.

We can create a graph projection from User and Movie nodes and the RATED relationships then run degree centrality to get a numberOfRatings property in the graph projection. From there, what GDS workflow works best to satisfy this use case?

❏ Use the gds.graph.saveProperties() operation to save the numberOfRatings property back to User nodes in the database
❏ Use the gds.graph.export.csv() operation to write the User nodes with numberOfRatings to csv files then re-import the data using LOAD CSV
❏ Use the gds.graph.nodeProperty.stream() operation to stream the numberOfRatings into a Cypher statement that uses the MATCH and SET commands to set the property to User nodes
✓ Use the gds.graph.nodeProperties.write() operation to write the numberOfRatings property back to the User nodes in the database
❏ You can just drop the graph projection with gds.graph.drop(). This will automatically save the numberOfRatings property, and any other property, back to the database

Hint

The following two commands are used to create the numberOfRatings property in the graph projection:

CALL gds.graph.project('my-graph-projection', ['User','Movie'], 'RATED');

CALL gds.degree.mutate('my-graph-projection', {mutateProperty:'numberOfRatings'});

Solution

The correct answer is:

Use the gds.graph.nodeProperties.write() operation to write the numberOfRatings property back to the User nodes in the database

Summary

In this lesson you learned about the graph catalog and basic mechanisms for managing graph projections.

In the upcoming lessons we will go into more depth with different projection types and how and when to use them.

Introduction to Neo4j Graph Data Science

Neo4j GDS Overview

Graph Management

Graph Catalog

What is the Graph Catalog?

How the Graph Catalog Works

Creating a Graph Projection

Running Algorithms

Streaming and Writing Node Properties

Exporting Graphs

Dropping Graphs

Other Graph Catalog Operations

Check your understanding

1. Creating a Graph Projection

2. Exporting to CSV

3. Saving Properties in the Database

Summary

Chatbot