Introduction
Louvain helped us explore the fraud network. But for formal community assignment, we need algorithms that are deterministic, explainable, and give us precise control.
Two algorithms
This lesson covers two foundational algorithms:
-
Degree Centrality — Counts connections
-
Weakly Connected Components (WCC) — Finds connected groups
We’ll practice on the Movies graph first, then apply what we learn to fraud detection.
What You’ll Learn
By the end of this lesson, you’ll be able to:
-
Calculate degree centrality with different orientations and weights
-
Explain the Union-Find algorithm underlying WCC
-
Configure WCC with thresholds, seeding, and performance optimizations
-
Choose between Louvain and WCC based on your requirements
Part 1: Degree Centrality
Degree centrality counts how many relationships a node has.
By default, it counts outgoing relationships. A node with 5 outgoing connections has degree 5.
This is the simplest centrality measure—no iteration, no convergence, just counting. But simple doesn’t mean useless.
Project Movies and Actors
Let’s practice degree centrality on the Movies dataset:
MATCH (source:Actor)-[r:ACTED_IN]->(target:Movie)
RETURN gds.graph.project(
'actors-movies',
source,
target
)This creates a bipartite graph: Actor nodes connected to Movie nodes. We’ll use this to explore degree centrality before applying it to fraud.
Run Degree Centrality
Count how many movies each actor has appeared in:
CALL gds.degree.stream('actors-movies', {}) // (1)
YIELD nodeId, score
WITH gds.util.asNode(nodeId) AS node, score // (2)
WHERE node:Actor
RETURN node.name AS actor, score AS movieCount
ORDER BY movieCount DESC
LIMIT 15-
Streams degree results without writing to the database — useful for exploration before committing
-
gds.util.asNode()converts internal GDS node IDs back to actual graph nodes so we can access properties likename
Examine results
This query should return a table that looks something like this:
| actor | movieCount |
|---|---|
Gérard Depardieu |
125 |
Eric Roberts |
120 |
John Wayne |
116 |
Christopher Lee |
115 |
John Carradine |
112 |
Samuel L. Jackson |
103 |
Michael Caine |
101 |
Nicolas Cage |
100 |
Bruce Willis |
99 |
Robert De Niro |
97 |
… |
… |
This shows the most prolific actors. The degree score equals the number of movies they’ve appeared in.
Orientation Options
You can control in which direction the algorithm should run using orientation:
| Orientation | Counts | Use Case |
|---|---|---|
|
Outgoing relationships |
How many movies has this actor been in? |
|
Incoming relationships |
How many actors are in this movie? |
|
Forward and back |
Both of the above at once |
Counting Incoming Relationships
Use REVERSE to count incoming relationships—how many actors are in each movie:
CALL gds.degree.stream('actors-movies', {
orientation: 'REVERSE' // (1)
})
YIELD nodeId, score
WITH gds.util.asNode(nodeId) AS node, score
WHERE node:Movie // (2)
RETURN node.title AS movie, score AS castSize
ORDER BY castSize DESC
LIMIT 15-
REVERSEcounts incoming relationships instead of outgoing — for the Actor→Movie projection, this counts how many actors point to each movie -
Filters to Movie nodes only since Actor nodes will have a reverse degree of 0
Note here that the movies in the underlying dataset have a max cast number of 10.
Write Degree to Nodes
Write degree values back to the database for later use:
CALL gds.degree.write('actors-movies', {
writeProperty: 'movieCount'
})
YIELD nodePropertiesWritten
RETURN nodePropertiesWrittenVerify the Written Property
Check that the degree property was written:
MATCH (a:Actor)
WHERE a.movieCount IS NOT NULL
RETURN a.name AS actor, a.movieCount AS numMovies
ORDER BY numMovies DESC
LIMIT 10Clean Up Actors-Movies Projection
CALL gds.graph.drop('actors-movies')Part 2: Weakly Connected Components
Weakly Connected Components groups nodes by connectivity.
Two nodes are in the same component if any path connects them—regardless of relationship direction or path length.
No optimization, no density calculation—just reachability.
Islands and Bridges
Think of nodes as islands and relationships as bridges.
WCC answers: "Which islands can you walk between?"
If any sequence of bridges connects two islands, they’re in the same component.
How WCC Works: Union-Find
Under the hood, WCC uses the Union-Find algorithm (also called Disjoint Set Union).
The algorithm has three operations:
-
Make-Set: Each node starts as its own component
-
Union: When processing a relationship, merge the two endpoint components
-
Find: Determine which component a node belongs to
Project Actor Collaborations
Let’s run WCC on actor collaborations:
MATCH (source:Actor)-[r:ACTED_IN]->(:Movie)<-[:ACTED_IN]-(target:Actor)
WHERE source <> target
WITH source, target, count(r) AS collaborations // (1)
RETURN gds.graph.project(
'actor-network',
source,
target,
{relationshipProperties: {collaborations: collaborations}} // (2)
)-
Counts how many movies each pair of actors appeared in together — this becomes the relationship weight
-
Stores collaboration count as a relationship property for later use with weighted algorithms
Run WCC
Write connected components to the graph:
CALL gds.wcc.write('actor-network', {
writeProperty: 'componentId'
})Examine WCC Results
Fetch the components from the graph:
MATCH (a:Actor)
WHERE a.componentId IS NOT NULL
WITH a.componentId AS componentId, collect(a.name) AS members // (1)
RETURN componentId, size(members) AS size, members[0..5] AS sampleMembers // (2)
ORDER BY size DESC
LIMIT 10-
Groups actors by their WCC-assigned component and collects names into a list
-
Shows only the first 5 members of each component to keep output manageable — large components may have thousands of actors
Examining WCC results
You should get a table that looks something like this:
╒[cols="1,1,3"]
| componentId | size | sampleMembers |
|---|---|---|
0 |
33744 |
Tasma Walton, Tia Texada, Mikko Nousiainen, Bo Svenson, Victor Adamson |
28480 |
32 |
Bernadette McDonald, Jimmy Chin, Lakpa Dendi, Barry Blanchard, Gesman Tamang |
28400 |
19 |
Jo Herbst, Barbara Hintz, Karl Krüger, Ingetraud Hinze, Ernst Pittschau |
34381 |
18 |
Charlie Kirk, Alice Dewey, Lisa Fox, Larry Elder, Sebastian Gorka |
… |
… |
… |
You’ll likely see one giant component containing most actors (the "main" Hollywood network), plus several smaller isolated groups. This is typical of collaboration networks.
Examine Component Distribution
See how components are distributed by running the algorithm in stats mode:
CALL gds.wcc.stats('actor-network', {})Return just one stats fields
If you don’t care about all the stats, and you just want to see the communityDistribution field, you can use YIELD and RETURN to specify:
CALL gds.wcc.stats('actor-network', {})
YIELD componentCount, componentDistribution
RETURN componentCount, componentDistributionWCC vs Louvain
| Aspect | Louvain | WCC |
|---|---|---|
Logic |
Optimizes modularity (density) |
Any path = same component |
Output |
Dense clusters |
All reachable nodes |
Determinism |
Can vary between runs |
Always identical |
Explainability |
"Modularity optimization grouped them" |
"A path connects them" |
Why Determinism Matters
Louvain might assign a user to community A in one run and community B in the next. WCC always produces identical results.
Determinism matters
This matters when:
-
Fraud investigations face legal scrutiny
-
You need to explain why someone was flagged
-
Results must be reproducible for audits
-
Downstream systems depend on stable IDs
WCC Finds Components Based on What You Project
This is WCC’s power: you define what "connected" means.
-
Project all relationships → WCC finds everyone who’s reachable
-
Project only specific relationships → WCC finds targeted groups
Part 3: Advanced WCC Configuration
Weighted WCC with Thresholds
Not all connections are meaningful. A single shared movie differs from ten shared movies.
Use threshold to ignore weak connections:
CALL gds.wcc.stream('actor-network', {
relationshipWeightProperty: 'collaborations', // (1)
threshold: 5.0 // (2)
})
YIELD nodeId, componentId
WITH componentId, collect(gds.util.asNode(nodeId).name) AS members
WHERE size(members) > 1
RETURN componentId, size(members) AS size, members[0..5] AS sampleMembers
ORDER BY size DESC
LIMIT 10-
Tells WCC to read the
collaborationsproperty as the relationship weight -
Only connections where actors collaborated more than 5 times are considered — this breaks the giant component into smaller, more meaningful groups of frequent collaborators
With threshold: 5.0, only connections where actors shared more than 2 movies are considered. This breaks the giant component into smaller, more meaningful groups of frequent collaborators.
Seeded Components
What happens when new data arrives daily? For billion-node graphs, full recomputation is expensive. Seeded WCC lets you build incrementally.
The idea: store previous component IDs, then only recompute for new or changed nodes.
Seeding: Step 1 — Project with Seed Property
The first time we ran WCC on the Movies graph, we wrote the `componentId`s to the graph.
Now, let’s include existing `componentId`s in our projection:
MATCH (source:Actor)-[:ACTED_IN]->(target:Movie)
RETURN gds.graph.project(
'seeded-graph',
source,
target,
{
sourceNodeProperties: source { .componentId }, // (1)
targetNodeProperties: target { .componentId }
}
)-
Includes the previously written
componentIdas a node property in the projection — WCC will use these as starting assignments instead of treating every node as a new singleton
Seeding: Step 2 — Run WCC with seedProperty
Reference the seed property in your WCC call:
CALL gds.wcc.write('seeded-graph', {
seedProperty: 'componentId', // (1)
writeProperty: 'componentId' // (2)
})
YIELD componentCount
RETURN componentCount-
Tells WCC to initialize component assignments from existing
componentIdvalues — nodes with seeds start in those components rather than as singletons -
Writes updated component IDs back to the same property, preserving stable IDs for unchanged components
Nodes with existing componentId values start in those components. New nodes trigger computation.
How Seeding Works
When WCC encounters seeded nodes:
-
Seeded nodes start in their seeded component
-
New nodes start as singleton components
-
Union-Find merges components
-
If components connect, they merge to the lower ID
Existing structure is preserved; new nodes join appropriately.
Seeding: Important Caveat
Seeding assumes components only grow or merge—never split.
If relationships are removed (not just added), seeding can produce incorrect results. In that case, run a full recompute.
Sampled vs Unsampled
WCC has two internal strategies:
| Strategy | When Used |
|---|---|
Unsampled |
Directed graphs (default) |
Sampled |
Undirected graphs, or directed with inverse index |
The sampled strategy is faster—it uses subgraph sampling to reduce work while maintaining correctness.
Enabling the Sampled Strategy
For directed graphs, add inverseIndexedRelationshipTypes during projection:
MATCH (source:UserP2P)-[r:SHARED_DEVICE]->(target:UserP2P)
RETURN gds.graph.project(
'optimized-graph',
source,
target,
{},
{ inverseIndexedRelationshipTypes: ['*'] } // (1)
)-
Creates an inverse index for all relationship types, enabling WCC’s faster sampled algorithm by allowing bidirectional traversal on directed graphs
The inverse index lets WCC traverse relationships in both directions, enabling the optimized algorithm.
On this graph, the performance gains are minimal — but this would be especially important on a large, constantly updating graph.
Memory Estimation
Before running on production data, estimate memory:
CALL gds.wcc.write.estimate('actor-network', { // (1)
writeProperty: 'wccId'
})
YIELD nodeCount, relationshipCount, bytesMin, bytesMax, requiredMemory
RETURN nodeCount, relationshipCount, requiredMemory // (2)-
The
.estimatevariant calculates memory requirements without actually running the algorithm -
Returns a human-readable
requiredMemorystring (e.g., "24 MiB") alongside raw node and relationship counts
If estimated memory exceeds available heap, GDS will block execution. Plan accordingly.
Clean Up
Drop the projection:
CALL gds.graph.drop('actor-network')Transfer: From Movies to Fraud
You’ve practiced on actor collaboration networks. Now let’s apply these algorithms:
| Movies Concept | Fraud Equivalent |
|---|---|
Actor nodes |
User nodes |
Movie nodes |
Card, Device, IP nodes |
Actors per movie (degree) |
Users per identifier (degree) |
Collaboration components |
Fraud ring components |
When to Use Each Algorithm
| Algorithm | Strengths | Use When |
|---|---|---|
Degree Centrality |
Simple, fast, interpretable |
Filtering noise, identifying hubs, preprocessing |
WCC |
Deterministic, explainable, scalable |
Formal grouping, auditable results, production systems |
Louvain |
Finds dense clusters, hierarchical |
Exploration, discovering unknown patterns |
The Lesson 5 Strategy
In Lesson 5, you’ll combine everything:
-
Create fraud-hypothesis relationships — Encode patterns like "P2P with shared card"
-
Use degree to filter — Exclude high-degree noise nodes
-
Run WCC — Find connected components of suspects
-
Label fraud risk — Mark users connected to known fraudsters
Summary
Degree Centrality counts relationships:
-
Use orientation to control direction (NATURAL, REVERSE, UNDIRECTED)
-
Use weights to measure connection strength
-
Essential for filtering noise before community detection
WCC finds connected components:
-
Deterministic and explainable—ideal for formal assignment
-
Thresholds filter weak connections
-
Seeding enables incremental processing
-
Inverse indexing improves performance on directed graphs
Both algorithms are simple by design. Their power comes from what you project and how you configure them.
In the next lesson, you’ll apply these algorithms to build fraud communities and identify 211 previously unknown fraud risk users.