Degree Centrality & WCC

Introduction

Louvain helped us explore the fraud network. But for formal community assignment, we need algorithms that are deterministic, explainable, and give us precise control.

Two algorithms

This lesson covers two foundational algorithms:

Degree Centrality — Counts connections
Weakly Connected Components (WCC) — Finds connected groups

We’ll practice on the Movies graph first, then apply what we learn to fraud detection.

What You’ll Learn

By the end of this lesson, you’ll be able to:

Calculate degree centrality with different orientations and weights
Explain the Union-Find algorithm underlying WCC
Configure WCC with thresholds, seeding, and performance optimizations
Choose between Louvain and WCC based on your requirements

Part 1: Degree Centrality

Degree centrality counts how many relationships a node has.

By default, it counts outgoing relationships. A node with 5 outgoing connections has degree 5.

A node has 5 outgoing relationships and three incoming relationships. Label reads "degree equals 5".

This is the simplest centrality measure—no iteration, no convergence, just counting. But simple doesn’t mean useless.

Project Movies and Actors

Let’s practice degree centrality on the Movies dataset:

cypher

Project actors and movies

MATCH (source:Actor)-[r:ACTED_IN]->(target:Movie)
RETURN gds.graph.project(
  'actors-movies',
  source,
  target
)

This creates a bipartite graph: Actor nodes connected to Movie nodes. We’ll use this to explore degree centrality before applying it to fraud.

Run Degree Centrality

Count how many movies each actor has appeared in:

cypher

Count movies per actor (outgoing relationships)

CALL gds.degree.stream('actors-movies', {}) // (1)
YIELD nodeId, score
WITH gds.util.asNode(nodeId) AS node, score // (2)
WHERE node:Actor
RETURN node.name AS actor, score AS movieCount
ORDER BY movieCount DESC
LIMIT 15

Streams degree results without writing to the database — useful for exploration before committing
gds.util.asNode() converts internal GDS node IDs back to actual graph nodes so we can access properties like name

Examine results

This query should return a table that looks something like this:

actor	movieCount
Gérard Depardieu	125
Eric Roberts	120
John Wayne	116
Christopher Lee	115
John Carradine	112
Samuel L. Jackson	103
Michael Caine	101
Nicolas Cage	100
Bruce Willis	99
Robert De Niro	97
…	…

This shows the most prolific actors. The degree score equals the number of movies they’ve appeared in.

Orientation Options

You can control in which direction the algorithm should run using orientation:

Orientation Counts Use Case

Orientation	Counts	Use Case
`NATURAL` (default)	Outgoing relationships	How many movies has this actor been in?
`REVERSE`	Incoming relationships	How many actors are in this movie?
`UNDIRECTED`	Forward and back	Both of the above at once

NATURAL (default)

Outgoing relationships

How many movies has this actor been in?

REVERSE

Incoming relationships

How many actors are in this movie?

UNDIRECTED

Forward and back

Both of the above at once

Counting Incoming Relationships

Use REVERSE to count incoming relationships—how many actors are in each movie:

cypher

Count actors per movie (incoming relationships)

CALL gds.degree.stream('actors-movies', {
  orientation: 'REVERSE' // (1)
})
YIELD nodeId, score
WITH gds.util.asNode(nodeId) AS node, score
WHERE node:Movie // (2)
RETURN node.title AS movie, score AS castSize
ORDER BY castSize DESC
LIMIT 15

REVERSE counts incoming relationships instead of outgoing — for the Actor→Movie projection, this counts how many actors point to each movie
Filters to Movie nodes only since Actor nodes will have a reverse degree of 0

Note here that the movies in the underlying dataset have a max cast number of 10.

Write Degree to Nodes

Write degree values back to the database for later use:

cypher

Write degree to Movie nodes

CALL gds.degree.write('actors-movies', {
  writeProperty: 'movieCount'
})
YIELD nodePropertiesWritten
RETURN nodePropertiesWritten

Verify the Written Property

Check that the degree property was written:

cypher

Check degree values on movies

MATCH (a:Actor)
WHERE a.movieCount IS NOT NULL
RETURN a.name AS actor, a.movieCount AS numMovies
ORDER BY numMovies DESC
LIMIT 10

Clean Up Actors-Movies Projection

cypher

Drop the actors-movies projection

CALL gds.graph.drop('actors-movies')

Part 2: Weakly Connected Components

Weakly Connected Components groups nodes by connectivity.

Two nodes are in the same component if any path connects them—regardless of relationship direction or path length.

No optimization, no density calculation—just reachability.

Islands and Bridges

Think of nodes as islands and relationships as bridges.

WCC answers: "Which islands can you walk between?"

If any sequence of bridges connects two islands, they’re in the same component.

Seven islands. Those connected by bridges are labeled as single components.

How WCC Works: Union-Find

Under the hood, WCC uses the Union-Find algorithm (also called Disjoint Set Union).

The algorithm has three operations:

Make-Set: Each node starts as its own component
Union: When processing a relationship, merge the two endpoint components
Find: Determine which component a node belongs to

Project Actor Collaborations

Let’s run WCC on actor collaborations:

cypher

Project actor collaborations for WCC

MATCH (source:Actor)-[r:ACTED_IN]->(:Movie)<-[:ACTED_IN]-(target:Actor)
WHERE source <> target
WITH source, target, count(r) AS collaborations // (1)
RETURN gds.graph.project(
  'actor-network',
  source,
  target,
  {relationshipProperties: {collaborations: collaborations}} // (2)
)

Counts how many movies each pair of actors appeared in together — this becomes the relationship weight
Stores collaboration count as a relationship property for later use with weighted algorithms

Run WCC

Write connected components to the graph:

cypher

Run WCC

CALL gds.wcc.write('actor-network', {
  writeProperty: 'componentId'
})

Examine WCC Results

Fetch the components from the graph:

cypher

Fetch components

MATCH (a:Actor)
WHERE a.componentId IS NOT NULL
WITH a.componentId AS componentId, collect(a.name) AS members // (1)
RETURN componentId, size(members) AS size, members[0..5] AS sampleMembers // (2)
ORDER BY size DESC
LIMIT 10

Groups actors by their WCC-assigned component and collects names into a list
Shows only the first 5 members of each component to keep output manageable — large components may have thousands of actors

Examining WCC results

You should get a table that looks something like this:

╒[cols="1,1,3"]

componentId	size	sampleMembers
0	33744	Tasma Walton, Tia Texada, Mikko Nousiainen, Bo Svenson, Victor Adamson
28480	32	Bernadette McDonald, Jimmy Chin, Lakpa Dendi, Barry Blanchard, Gesman Tamang
28400	19	Jo Herbst, Barbara Hintz, Karl Krüger, Ingetraud Hinze, Ernst Pittschau
34381	18	Charlie Kirk, Alice Dewey, Lisa Fox, Larry Elder, Sebastian Gorka
…	…	…

You’ll likely see one giant component containing most actors (the "main" Hollywood network), plus several smaller isolated groups. This is typical of collaboration networks.

Examine Component Distribution

See how components are distributed by running the algorithm in stats mode:

cypher

Component size distribution

CALL gds.wcc.stats('actor-network', {})

Return just one stats fields

If you don’t care about all the stats, and you just want to see the communityDistribution field, you can use YIELD and RETURN to specify:

cypher

Component size distribution

CALL gds.wcc.stats('actor-network', {})
YIELD componentCount, componentDistribution
RETURN componentCount, componentDistribution

WCC vs Louvain

Aspect	Louvain	WCC
Logic	Optimizes modularity (density)	Any path = same component
Output	Dense clusters	All reachable nodes
Determinism	Can vary between runs	Always identical
Explainability	"Modularity optimization grouped them"	"A path connects them"

Aspect

Louvain

WCC

Logic

Optimizes modularity (density)

Any path = same component

Output

Dense clusters

All reachable nodes

Determinism

Can vary between runs

Always identical

Explainability

"Modularity optimization grouped them"

"A path connects them"

Why Determinism Matters

Louvain might assign a user to community A in one run and community B in the next. WCC always produces identical results.

Determinism matters

This matters when:

Fraud investigations face legal scrutiny
You need to explain why someone was flagged
Results must be reproducible for audits
Downstream systems depend on stable IDs

WCC Finds Components Based on What You Project

This is WCC’s power: you define what "connected" means.

Project all relationships → WCC finds everyone who’s reachable
Project only specific relationships → WCC finds targeted groups

Part 3: Advanced WCC Configuration

Weighted WCC with Thresholds

Not all connections are meaningful. A single shared movie differs from ten shared movies.

Use threshold to ignore weak connections:

cypher

WCC with threshold

CALL gds.wcc.stream('actor-network', {
  relationshipWeightProperty: 'collaborations', // (1)
  threshold: 5.0 // (2)
})
YIELD nodeId, componentId
WITH componentId, collect(gds.util.asNode(nodeId).name) AS members
WHERE size(members) > 1
RETURN componentId, size(members) AS size, members[0..5] AS sampleMembers
ORDER BY size DESC
LIMIT 10

Tells WCC to read the collaborations property as the relationship weight
Only connections where actors collaborated more than 5 times are considered — this breaks the giant component into smaller, more meaningful groups of frequent collaborators

With threshold: 5.0, only connections where actors shared more than 5 movies are considered. This breaks the giant component into smaller, more meaningful groups of frequent collaborators.

Seeded Components

What happens when new data arrives daily? For billion-node graphs, full recomputation is expensive. Seeded WCC lets you build incrementally.

The idea: store previous component IDs, then only recompute for new or changed nodes.

Flowchart illustrating how seeded WCC works.

Seeding: Step 1 — Project with Seed Property

The first time we ran WCC on the Movies graph, we wrote the `componentId`s to the graph.

Now, let’s include existing `componentId`s in our projection:

cypher

Project with seed property

MATCH (source:Actor)-[:ACTED_IN]->(target:Movie)
RETURN gds.graph.project(
  'seeded-graph',
  source,
  target,
  {
    sourceNodeProperties: source { .componentId }, // (1)
    targetNodeProperties: target { .componentId }
  }
)

Includes the previously written componentId as a node property in the projection — WCC will use these as starting assignments instead of treating every node as a new singleton

Seeding: Step 2 — Run WCC with seedProperty

Reference the seed property in your WCC call:

cypher

Run seeded WCC

CALL gds.wcc.write('seeded-graph', {
  seedProperty: 'componentId', // (1)
  writeProperty: 'componentId' // (2)
})
YIELD componentCount
RETURN componentCount

Tells WCC to initialize component assignments from existing componentId values — nodes with seeds start in those components rather than as singletons
Writes updated component IDs back to the same property, preserving stable IDs for unchanged components

Nodes with existing componentId values start in those components. New nodes trigger computation.

How Seeding Works

When WCC encounters seeded nodes:

Seeded nodes start in their seeded component
New nodes start as singleton components
Union-Find merges components
If components connect, they merge to the lower ID

Existing structure is preserved; new nodes join appropriately.

Seeding: Important Caveat

Seeding assumes components only grow or merge—never split.

If relationships are removed (not just added), seeding can produce incorrect results. In that case, run a full recompute.

Sampled vs Unsampled

WCC has two internal strategies:

Strategy	When Used
Unsampled	Directed graphs (default)
Sampled	Undirected graphs, or directed with inverse index

Strategy

When Used

Unsampled

Directed graphs (default)

Sampled

Undirected graphs, or directed with inverse index

The sampled strategy is faster—it uses subgraph sampling to reduce work while maintaining correctness.

Enabling the Sampled Strategy

For directed graphs, add inverseIndexedRelationshipTypes during projection:

cypher

Project with inverse index

MATCH (source:UserP2P)-[r:USED]->(:Device)<-[:USED]-(target:UserP2P)
RETURN gds.graph.project(
  'optimized-graph',
  source,
  target,
  {},
  { inverseIndexedRelationshipTypes: ['*'] } // (1)
)

Creates an inverse index for all relationship types, enabling WCC’s faster sampled algorithm by allowing bidirectional traversal on directed graphs

The inverse index lets WCC traverse relationships in both directions, enabling the optimized algorithm.

On this graph, the performance gains are minimal — but this would be especially important on a large, constantly updating graph.

Memory Estimation

Before running on production data, estimate memory:

cypher

Estimate WCC memory

CALL gds.wcc.write.estimate('actor-network', { // (1)
  writeProperty: 'wccId'
})
YIELD nodeCount, relationshipCount, bytesMin, bytesMax, requiredMemory
RETURN nodeCount, relationshipCount, requiredMemory // (2)

The .estimate variant calculates memory requirements without actually running the algorithm
Returns a human-readable requiredMemory string (e.g., "24 MiB") alongside raw node and relationship counts

If estimated memory exceeds available heap, GDS will block execution. Plan accordingly.

Clean Up

Drop the projection:

cypher

Drop the actor network projection

CALL gds.graph.drop('actor-network')

Transfer: From Movies to Fraud

You’ve practiced on actor collaboration networks. Now let’s apply these algorithms:

Movies Concept	Fraud Equivalent
Actor nodes	User nodes
Movie nodes	Card, Device, IP nodes
Actors per movie (degree)	Users per identifier (degree)
Collaboration components	Fraud ring components

Movies Concept

Fraud Equivalent

Actor nodes

User nodes

Movie nodes

Card, Device, IP nodes

Actors per movie (degree)

Users per identifier (degree)

Collaboration components

Fraud ring components

When to Use Each Algorithm

Algorithm	Strengths	Use When
Degree Centrality	Simple, fast, interpretable	Filtering noise, identifying hubs, preprocessing
WCC	Deterministic, explainable, scalable	Formal grouping, auditable results, production systems
Louvain	Finds dense clusters, hierarchical	Exploration, discovering unknown patterns

Algorithm

Strengths

Use When

Degree Centrality

Simple, fast, interpretable

Filtering noise, identifying hubs, preprocessing

WCC

Deterministic, explainable, scalable

Formal grouping, auditable results, production systems

Louvain

Finds dense clusters, hierarchical

Exploration, discovering unknown patterns

The Lesson 5 Strategy

In Lesson 5, you’ll combine everything:

Create fraud-hypothesis relationships — Encode patterns like "P2P with shared card"
Use degree to filter — Exclude high-degree noise nodes
Run WCC — Find connected components of suspects
Label fraud risk — Mark users connected to known fraudsters

Summary

Degree Centrality counts relationships:

Use orientation to control direction (NATURAL, REVERSE, UNDIRECTED)
Use weights to measure connection strength
Essential for filtering noise before community detection

WCC finds connected components:

Deterministic and explainable—ideal for formal assignment
Thresholds filter weak connections
Seeding enables incremental processing
Inverse indexing improves performance on directed graphs

Both algorithms are simple by design. Their power comes from what you project and how you configure them.

In the next lesson, you’ll apply these algorithms to build fraud communities and identify 211 previously unknown fraud risk users.

Graph Data Science in Practice

GDS Foundations

Community Detection for Fraud

Degree Centrality & WCC

Introduction

Two algorithms

What You’ll Learn

Part 1: Degree Centrality

Project Movies and Actors

Run Degree Centrality

Examine results

Orientation Options

Counting Incoming Relationships

Write Degree to Nodes

Verify the Written Property

Clean Up Actors-Movies Projection

Part 2: Weakly Connected Components

Islands and Bridges

How WCC Works: Union-Find

Project Actor Collaborations

Run WCC

Examine WCC Results

Examining WCC results

Examine Component Distribution

Return just one stats fields

WCC vs Louvain

Why Determinism Matters

Determinism matters

WCC Finds Components Based on What You Project

Part 3: Advanced WCC Configuration

Weighted WCC with Thresholds

Seeded Components

Seeding: Step 1 — Project with Seed Property

Seeding: Step 2 — Run WCC with seedProperty

How Seeding Works

Seeding: Important Caveat

Sampled vs Unsampled

Enabling the Sampled Strategy

Memory Estimation

Clean Up

Transfer: From Movies to Fraud

When to Use Each Algorithm

The Lesson 5 Strategy

Summary

Chatbot

Data Model