Introduction
You’ve spent the last two modules using GDS in the Neo4j Browser—projecting graphs, running algorithms, and interpreting results. That foundation is solid.
Moving to Python
Now it’s time to take those same skills into the environment where most real-world data science happens: Python.
Same algorithms, different syntax
The Python GDS client isn’t a different way of doing graph analytics. It’s the same algorithms, the same concepts, the same workflows—just wrapped in a Pythonic interface that plays nicely with pandas, scikit-learn, and the rest of the Python ecosystem.
What You’ll Learn
By the end of this lesson, you’ll be able to:
-
Set up a development environment for GDS work in Python
-
Connect to Neo4j using the Python GDS client
-
Execute Cypher queries and receive results as pandas DataFrames
-
Recognize when Python makes more sense than Browser (and vice versa)
Setting Up Your Environment
Before we write any code, let’s get your development environment ready.
Click the button below to open the workshop repository in a GitHub Codespace. This will clone the repository and set up a pre-configured Python environment automatically.
The Codespace takes approximately 10 minutes to configure. While it’s setting up, continue through the next few slides—we’ll walk through the concepts before you need to run any code.
How the Python Client Works
The GDS Python client acts as a bridge between your Python code and your Neo4j server.
Under the hood, it translates your Python method calls into Cypher queries, sends them to the server, executes them against the GDS library, and returns results as pandas DataFrames.
Everything still applies
This means everything you learned about GDS in the Browser still applies. The algorithms haven’t changed. The projections work the same way. You’re just using a different interface to access them.
When to Use Python vs. Browser
Both tools have their place. The key is knowing which one fits your current task.
Reach for Python when you’re:
-
Building repeatable data pipelines
-
Automating workflows that run regularly
-
Integrating graph analytics with other Python libraries
-
Working with results that need further processing
Stick with Browser when you’re:
-
Exploring data interactively
-
Running quick, one-off queries
-
Debugging projection or algorithm issues
-
Visually inspecting graph structure
For this module, we’ll work primarily in Python—but you’ll likely switch between both in practice.
Installing the Client
The official package is graphdatascience. In the Codespace you’ll use, it’s already installed. Otherwise:
pip install graphdatascienceConnecting to Neo4j
With the package installed, connecting is straightforward:
from graphdatascience import GraphDataScience
gds = GraphDataScience( # (1)
"bolt://localhost:7687",
auth=("neo4j", "password")
)
# Verify the connection works
print(gds.server_version()) # (2)-
Create a
GraphDataScienceinstance with your Neo4j connection URI and credentials -
Always verify the connection — this returns the GDS library version running on the server
Connecting to a Specific Database
By default, the client connects to the "neo4j" database. If your database has a different name, specify it explicitly:
gds = GraphDataScience(uri, auth=(user, password), database="my-db")Running Cypher Queries
Once connected, you can run any Cypher query using gds.run_cypher(). The results come back as a pandas DataFrame—ready for analysis, visualization, or further processing.
result = gds.run_cypher(""" # (1)
MATCH (m:Movie)
RETURN m.title AS movie, m.year AS year
ORDER BY m.year DESC
LIMIT 10
""")
print(result.head()) # (2)-
gds.run_cypher()accepts any valid Cypher query and sends it to the server -
Results are returned as a pandas DataFrame — use
.head(),.describe(), or any pandas method directly
This is useful for ad-hoc queries, but for GDS-specific operations (projections, algorithms), we’ll use dedicated methods in the next lesson.
Closing Connections
When you’re finished, close the connection to free up resources:
gds.close()Python will call this automatically when the gds object is garbage collected, but it’s good practice to close connections explicitly—especially in notebooks where objects can persist longer than expected.
Our Dataset: The Cora Citation Network
Throughout this module, we’ll analyze a real academic citation network called Cora.
What’s in the dataset:
-
2,708 papers spanning 7 research subjects
-
10,556 citations (directed edges: Paper A → Paper B means A cites B)
-
1,433-dimensional feature vectors (word frequencies from paper abstracts)
The seven research subjects
The Cora dataset is a classic benchmark dataset for graph machine learning.
It includes 2,708 academic papers, 10,556 citation relationships, and spans across 7 research subjects:
-
Neural Networks
-
Reinforcement Learning
-
Theory
-
Genetic Algorithms
-
Case-Based Reasoning
-
Probabilistic Methods
-
Rule Learning
This is a classic dataset in machine learning research—small enough to iterate quickly, rich enough to demonstrate real patterns.
What’s Ahead
With Python as our interface, we’ll work through the complete GDS workflow:
-
Projecting graphs into memory
-
Running algorithms like PageRank, Betweenness Centrality, Louvain, and FastRP
-
Processing results as DataFrames
-
Cleaning up projections when we’re done
Each algorithm will follow the same pattern: deep-dive on the Movies dataset (which you know well), then hands-on practice with Cora.
Summary
The Python GDS client gives you programmatic access to everything you learned in the Browser:
-
Same algorithms, same projection logic, same workflows
-
Results returned as pandas DataFrames
-
Version compatibility between client, driver, and GDS library matters
Your Codespace should be ready by now. In the next lesson, we’ll connect to Neo4j and run our first GDS workflow entirely in Python.