What is the Graph Catalog?
The graph catalog is a concept that allows you to manage graph projections in GDS. This includes
-
creating graphs by projecting, filtering or sampling
-
viewing details about graphs
-
dropping graph projections
-
exporting graph projections
-
writing graph projection properties back to the database
How the Graph Catalog Works
You can call graph catalog operations with commands of the form
CALL gds.graph.<command>
For example, we can list the graph projections that currently exist in our database with the below command.
CALL gds.graph.list()
This will return an empty list since we haven’t created any projections yet.
Creating a Graph Projection
During this module, you will be using a movie recommendations dataset that contains information about movies, actors, and users who have rated movies.
You can create a projection from the Actor
and Movie
nodes and the ACTED_IN
relationship with the below command.
CALL gds.graph.project(
'my-graph-projection',
['Actor','Movie'],
'ACTED_IN'
)
If we now list graphs again we should see information on the graph we just made:
CALL gds.graph.list()
YIELD graphName, nodeCount, relationshipCount, schema
"graphName" | "nodeCount" | "relationshipCount" | "schema" |
---|---|---|---|
"my-graph-projection" |
24568 |
35910 |
|
Running Algorithms
As mentioned in previous lessons, the purpose of creating a projection is to provide a space for running graph algorithms and doing graph data science efficiently.
As a simple example of a graph algorithm, we will run degree centrality on Actor
nodes. We will go over the degree centrality algorithm and execution modes in the Neo4j Graph Data Science Fundamentals Course. For now, just know that this will count the number of movies each actor was in and store it on a node property called numberOfMoviesActedIn
inside the projection (it will not be written back to the database yet).
CALL gds.degree.mutate('my-graph-projection', {mutateProperty:'numberOfMoviesActedIn'})
Streaming and Writing Node Properties
There will be times when we want to take the results from our algorithm calculations and either stream them into another process or write them back to the database. The graph catalog has methods to stream and write both node properties and relationship properties for these purposes. We will go over this for the case of node properties below.
Using our numberOfMoviesActedIn
example, we can stream the top 10 most prolific actors by movie count using the nodeProperty.stream
graph catalog operation.
CALL gds.graph.nodeProperty.stream(
'my-graph-projection',
'numberOfMoviesActedIn'
)
YIELD nodeId, propertyValue
RETURN
gds.util.asNode(nodeId).name AS actorName,
propertyValue AS numberOfMoviesActedIn
ORDER BY numberOfMoviesActedIn DESCENDING, actorName LIMIT 10
If we instead wanted to write the property back to the database we could use the nodeProperties.write
operation.
CALL gds.graph.nodeProperties.write(
'my-graph-projection',
['numberOfMoviesActedIn'],
['Actor']
)
We could then query the top 10 most prolific actors by movie count with Cypher.
MATCH (a:Actor)
RETURN a.name, a.numberOfMoviesActedIn
ORDER BY a.numberOfMoviesActedIn DESCENDING, a.name LIMIT 10
Exporting Graphs
In a data science workflow, you may encounter situations where you need to bulk export data from a graph projection after performing graph algorithms and other analytics. For example, you may want to:
-
export graph features for training a machine learning model in another environment
-
create separate analytical views for downstream analytics and/or sharing with colleagues.
-
produce snapshots of analytical results and persist to the filesystem
The graph catalog has two methods for export:
-
gds.graph.export
to export a graph into a new database - effectively copying the projection into a separate Neo4j database -
gds.beta.graph.export.csv
to export a graph to csv files
Dropping Graphs
Projected graphs take up space in memory so once we are done working with a graph projection it is smart to remove it. We can do this with the drop command below:
CALL gds.graph.drop('my-graph-projection')
Now when we list graphs it will be empty again.
CALL gds.graph.list()
Other Graph Catalog Operations
There are a few other management operations in the graph catalog that we will not be going over in detail inside this module such as filtering projections, delete, and remove operations. You can read about all of them in our Graph Catalog documentation.
Check your understanding
1. Creating a Graph Projection
Which graph catalog operation can you use for creating a graph projection?
-
❏
CALL gds.graph.createGraph
-
❏
CALL gds.graph.proj
-
✓
CALL gds.graph.project
-
❏
CALL gds.graph.createProjection
Hint
You would call this procedure to project a graph.
Solution
The answer is CALL gds.graph.project
.
2. Exporting to CSV
What graph catalog operation can you use to export a projection to csv files?
-
❏
gds.graph.csv.projectionExport
-
❏
gds.graph.csv.export
-
❏
gds.graph.export
-
✓
gds.graph.export.csv
Hint
In the Exporting Graphs section, it explains that you can export your projection into a new graph using gds.graph.export
and use the gds.beta.graph.export.csv
procedure to export to CSV.
Solution
The answer is gds.graph.export.csv
.
3. Saving Properties in the Database
Suppose you have a need to calculate the number of reviews made by each User node and save it as a property in the database so it can be queried later with Cypher.
We can create a graph projection from User
and Movie
nodes and the RATED
relationships then run degree centrality to get a numberOfRatings
property in the graph projection. From there, what GDS workflow works best to satisfy this use case?
-
❏ Use the
gds.graph.saveProperties()
operation to save thenumberOfRatings
property back toUser
nodes in the database -
❏ Use the
gds.graph.export.csv()
operation to write theUser
nodes withnumberOfRatings
to csv files then re-import the data usingLOAD CSV
-
❏ Use the
gds.graph.nodeProperty.stream()
operation to stream thenumberOfRatings
into a Cypher statement that uses theMATCH
andSET
commands to set the property toUser
nodes -
✓ Use the
gds.graph.nodeProperties.write()
operation to write thenumberOfRatings
property back to the User nodes in the database -
❏ You can just drop the graph projection with
gds.graph.drop()
. This will automatically save thenumberOfRatings
property, and any other property, back to the database
Hint
The following two commands are used to create the numberOfRatings
property in the graph projection:
CALL gds.graph.project('my-graph-projection', ['User','Movie'], 'RATED');
CALL gds.degree.mutate('my-graph-projection', {mutateProperty:'numberOfRatings'});
Solution
The correct answer is:
Use the gds.graph.nodeProperties.write()
operation to write the numberOfRatings
property back to the User nodes in the database
Summary
In this lesson you learned about the graph catalog and basic mechanisms for managing graph projections.
In the upcoming lessons we will go into more depth with different projection types and how and when to use them.