Introduction to the Python GDS Client

Introduction

You’ve spent the last two modules using GDS in the Neo4j Browser—projecting graphs, running algorithms, and interpreting results. That foundation is solid.

Moving to Python

Now it’s time to take those same skills into the environment where most real-world data science happens: Python.

Same algorithms, different syntax

The Python GDS client isn’t a different way of doing graph analytics. It’s the same algorithms, the same concepts, the same workflows—just wrapped in a Pythonic interface that plays nicely with pandas, scikit-learn, and the rest of the Python ecosystem.

What You’ll Learn

By the end of this lesson, you’ll be able to:

  • Set up a development environment for GDS work in Python

  • Connect to Neo4j using the Python GDS client

  • Execute Cypher queries and receive results as pandas DataFrames

  • Recognize when Python makes more sense than Browser (and vice versa)

Setting Up Your Environment

Before we write any code, let’s get your development environment ready.

Click the button below to open the workshop repository in a GitHub Codespace. This will clone the repository and set up a pre-configured Python environment automatically.

Open in GitHub Codespace

The Codespace takes approximately 10 minutes to configure. While it’s setting up, continue through the next few slides—we’ll walk through the concepts before you need to run any code.

How the Python Client Works

The GDS Python client acts as a bridge between your Python code and your Neo4j server.

Under the hood, it translates your Python method calls into Cypher queries, sends them to the server, executes them against the GDS library, and returns results as pandas DataFrames.

Flowchart showing the GDS Python client connecting Python code to the Neo4j server.

Everything still applies

This means everything you learned about GDS in the Browser still applies. The algorithms haven’t changed. The projections work the same way. You’re just using a different interface to access them.

When to Use Python vs. Browser

Both tools have their place. The key is knowing which one fits your current task.

Reach for Python when you’re:

  • Building repeatable data pipelines

  • Automating workflows that run regularly

  • Integrating graph analytics with other Python libraries

  • Working with results that need further processing

Stick with Browser when you’re:

  • Exploring data interactively

  • Running quick, one-off queries

  • Debugging projection or algorithm issues

  • Visually inspecting graph structure

For this module, we’ll work primarily in Python—but you’ll likely switch between both in practice.

Installing the Client

The official package is graphdatascience. In the Codespace you’ll use, it’s already installed. Otherwise:

bash
pip install graphdatascience

Connecting to Neo4j

With the package installed, connecting is straightforward:

python
from graphdatascience import GraphDataScience

gds = GraphDataScience( # (1)
    "bolt://localhost:7687",
    auth=("neo4j", "password")
)

# Verify the connection works
print(gds.server_version()) # (2)
  1. Create a GraphDataScience instance with your Neo4j connection URI and credentials

  2. Always verify the connection — this returns the GDS library version running on the server

Connecting to a Specific Database

By default, the client connects to the "neo4j" database. If your database has a different name, specify it explicitly:

python
gds = GraphDataScience(uri, auth=(user, password), database="my-db")

Running Cypher Queries

Once connected, you can run any Cypher query using gds.run_cypher(). The results come back as a pandas DataFrame—ready for analysis, visualization, or further processing.

python
result = gds.run_cypher(""" # (1)
    MATCH (m:Movie)
    RETURN m.title AS movie, m.year AS year
    ORDER BY m.year DESC
    LIMIT 10
""")

print(result.head()) # (2)
  1. gds.run_cypher() accepts any valid Cypher query and sends it to the server

  2. Results are returned as a pandas DataFrame — use .head(), .describe(), or any pandas method directly

This is useful for ad-hoc queries, but for GDS-specific operations (projections, algorithms), we’ll use dedicated methods in the next lesson.

Closing Connections

When you’re finished, close the connection to free up resources:

python
gds.close()

Python will call this automatically when the gds object is garbage collected, but it’s good practice to close connections explicitly—especially in notebooks where objects can persist longer than expected.

Our Dataset: The Cora Citation Network

Throughout this module, we’ll analyze a real academic citation network called Cora.

What’s in the dataset:

  • 2,708 papers spanning 7 research subjects

  • 10,556 citations (directed edges: Paper A → Paper B means A cites B)

  • 1,433-dimensional feature vectors (word frequencies from paper abstracts)

Diagram of the Cora citation network with nodes for papers and directed edges for citations.

The seven research subjects

The Cora dataset is a classic benchmark dataset for graph machine learning.

It includes 2,708 academic papers, 10,556 citation relationships, and spans across 7 research subjects:

  • Neural Networks

  • Reinforcement Learning

  • Theory

  • Genetic Algorithms

  • Case-Based Reasoning

  • Probabilistic Methods

  • Rule Learning

This is a classic dataset in machine learning research—small enough to iterate quickly, rich enough to demonstrate real patterns.

What’s Ahead

With Python as our interface, we’ll work through the complete GDS workflow:

  • Projecting graphs into memory

  • Running algorithms like PageRank, Betweenness Centrality, Louvain, and FastRP

  • Processing results as DataFrames

  • Cleaning up projections when we’re done

Each algorithm will follow the same pattern: deep-dive on the Movies dataset (which you know well), then hands-on practice with Cora.

Summary

The Python GDS client gives you programmatic access to everything you learned in the Browser:

  • Same algorithms, same projection logic, same workflows

  • Results returned as pandas DataFrames

  • Version compatibility between client, driver, and GDS library matters

Your Codespace should be ready by now. In the next lesson, we’ll connect to Neo4j and run our first GDS workflow entirely in Python.

Chatbot

How can I help you today?