Introduction
Whether you’re working in the Browser or in Python, GDS workflows follow the same fundamental pattern. The steps are identical. The logic is identical. Only the syntax changes.
This lesson walks through that workflow step by step, showing you how each piece translates from Cypher to Python.
What You’ll Learn
By the end of this lesson, you’ll be able to:
-
Execute the standard five-step GDS workflow in Python
-
Create graph projections and inspect them using the Graph object
-
Choose the right execution mode for different situations
-
Work with algorithm results as pandas DataFrames
-
Clean up projections properly to manage memory
The GDS Workflow
Every GDS analysis follows the same five steps:
-
Load data into Neo4j (if needed)
-
Project the graph into GDS memory
-
Run algorithms on the projection
-
Work with results
-
Drop the projection
You did this in Modules 1 and 2 using Cypher. Now you’ll do the same thing in Python.
From Cypher to Python
In Module 2, you wrote Cypher projections like this:
MATCH (source:User)-[r:P2P]->(target:User)
WITH gds.graph.project('fraud-graph', source, target) AS g
RETURN g.graphName, g.nodeCountThe Python equivalent uses the same concepts, but with a different interface. Let’s work through each step.
Step 1: Loading Data
If your data isn’t already in Neo4j, you can load it using gds.run_cypher(). This method executes any Cypher query and returns results as a pandas DataFrame.
# Load Movie nodes from CSV
gds.run_cypher(f""" # (1)
LOAD CSV WITH HEADERS FROM '{CSV_URLS['movies']}' AS row
MERGE (m:Movie {{tmdbId: row.tmdbId}}) # (2)
SET m.title = row.title,
m.year = toInteger(row.year),
m.imdbRating = toFloat(row.imdbRating)
""")-
gds.run_cypher()sends any Cypher query to the server — here using an f-string to inject the CSV URL -
Double braces
{{}}are required in f-strings to produce literal{}in the Cypher query
For this workshop, the companion notebook handles data loading. In practice, you’d often connect to an existing database.
Step 2: Creating Projections
The gds.graph.project() method returns two values: a Graph object and metadata about the projection.
G, result = gds.graph.project( # (1)
"movies-graph",
{
"Actor": {
"properties": {
"born": {"defaultValue": 1900} # (2)
}
},
"Movie": {
"properties": {
"year": {"defaultValue": 1900},
"imdbRating": {"defaultValue": 0.0}
}
}
},
"ACTED_IN"
)-
Returns a tuple:
G(the Graph object for inspecting/running algorithms) andresult(projection metadata) -
defaultValuehandles nodes missing a property — equivalent tocoalesce()in Cypher projections
This example uses native projection syntax. In the next lesson, you’ll learn how to translate your Cypher projection knowledge to native projection in Python.
The Graph Object
The Graph object (G) gives you methods to inspect your projection without querying the catalog directly.
G.name() # Returns the graph name
G.node_count() # Number of nodes in projection
G.relationship_count() # Number of relationships
G.node_labels() # List of node labels
G.relationship_types() # List of relationship types
G.node_properties("Movie") # (1)
G.memory_usage() # (2)
G.exists() # True if graph exists in catalog-
Returns the list of properties projected for a specific label — useful to verify before running algorithms
-
Check memory consumption to ensure the projection fits within your server’s available heap
These methods are useful for verifying your projection before running algorithms.
Step 3: Running Algorithms
Algorithm calls follow a consistent pattern:
gds.<algorithm>.<mode>(G, **config)For example, to run degree centrality in mutate mode:
result = gds.degree.mutate( # (1)
G, mutateProperty="degree"
)
# Verify the property was added
print(G.node_properties("Actor")) # (2)-
.mutate()stores the result as a new property on the in-memory projection — not in the database -
After mutating, the property appears alongside any projected properties (e.g.
['born', 'degree'])
The mode you choose determines what happens with the results.
The Four Execution Modes
Each mode serves a different purpose:
-
.stream()— Returns results as a DataFrame. Use when you want to analyze or visualize results in Python. -
.mutate()— Stores results in the projection only. Use when chaining multiple algorithms together. -
.write()— Writes results back to Neo4j. Use when you need to persist results for later queries. -
.stats()— Returns statistics only. Use for quick checks without storing anything.
Stream Mode in Practice
Stream mode is the most common choice for analysis work. Results come back as a pandas DataFrame.
df = gds.degree.stream(G) # (1)
# Standard pandas operations work immediately
top_nodes = df.nlargest(10, "score") # (2)
print(top_nodes)-
.stream()returns a DataFrame withnodeIdandscorecolumns — no side effects on the projection or database -
Since results are a standard pandas DataFrame, you can chain any pandas operation directly
Step 4: Working with Results
Since stream mode returns DataFrames, you can use the full pandas toolkit. Filter, sort, merge, visualize—whatever your analysis requires.
# Get degree centrality scores
scores = gds.degree.stream(G)
# Find nodes above a threshold
high_degree = scores[scores["score"] > 50] # (1)
# Calculate summary statistics
print(scores["score"].describe()) # (2)-
Standard pandas boolean indexing works directly on the streamed results
-
.describe()gives you count, mean, std, min/max — a quick way to understand the score distribution
Step 5: Cleanup
Projections consume memory. When you’re finished with a projection, drop it.
# Drop using the Graph object
G.drop()
# Or use the catalog
gds.graph.drop("movies-graph")
# Check what projections remain
print(gds.graph.list())Forgetting to drop projections is a common source of memory issues, especially in notebooks where you might create multiple projections during exploration.
The Context Manager Pattern
Python’s with statement provides automatic cleanup. When the block ends, the projection is dropped—even if an error occurs.
with gds.graph.project( # (1)
"temp", ["User", "Movie"], "RATED"
)[0] as G:
result = gds.degree.stream(G)
print(f"Ran on {G.node_count()} nodes")
display(result.nlargest(5, "score"))
# G has been dropped automatically
print(gds.graph.exists("temp")["exists"]) # (2)-
[0]extracts the Graph object from the returned tuple — thewithblock ensures it is dropped when the block exits -
After the
withblock, the projection no longer exists in memory — even if an error occurred inside the block
This pattern is especially useful for exploratory work where you’re creating and discarding projections frequently.
Putting It Together
Here’s the complete workflow in one place:
from graphdatascience import GraphDataScience
gds = GraphDataScience(uri, auth=(username, password))G, _ = gds.graph.project( # (1)
"movies-graph",
["Actor", "Movie"],
{"ACTED_IN": {"orientation": "UNDIRECTED"}}
)
# Run algorithm
df = gds.degree.stream(G) # (2)
# Work with results
print(df.nlargest(10, "score"))
# Cleanup
G.drop() # (3)
gds.close()-
Project with
UNDIRECTEDorientation so edges flow both ways — required for many centrality algorithms -
Stream results into a DataFrame for immediate analysis
-
Always drop the projection and close the connection when finished
Summary
The GDS workflow in Python mirrors what you learned in Cypher:
-
gds.graph.project()returns a Graph object for inspecting projections -
Four execution modes let you choose where results go:
.stream(),.mutate(),.write(),.stats() -
Always drop projections when finished—or use context managers for automatic cleanup
-
Include
defaultValuewhen projecting properties to handle nulls
In the companion notebook, you’ll work through each step hands-on with the Movies dataset.
Next: Understanding projection syntax options—native vs. Cypher projection in Python.