Reducing Search Space with Louvain

Introduction

You’ve learned how Louvain finds communities by optimizing modularity.

Now let’s put it to work. You’ll project the fraud network, run Louvain, and see how it reduces your search space by 98%.

What You’ll Learn

By the end of this lesson, you’ll be able to:

Create a heterogeneous graph projection for fraud analysis
Run Louvain in stats and write modes
Identify communities containing known fraudsters
Focus investigation on high-priority communities

Step 1: Project the Graph

Create a projection of Users, Cards, and Devices:

cypher

Create the fraud graph projection

MATCH (source:UserP2P)-[r:HAS_CC|USED|P2P]->(target:UserP2P|Card|Device)
RETURN gds.graph.project(
  'fraud-graph',
  source,
  target,
  {relationshipType: type(r)}, // (1)
  {undirectedRelationshipTypes: ['HAS_CC', 'USED']} // (2)
)

Preserves original relationship types (HAS_CC, USED, P2P) in the projection so they can be referenced later
Makes sharing relationships undirected — if User A and User B share a card, the connection propagates both ways

Understanding the Projection

This projection makes several important choices:

Why include Cards and Devices?

Fraudsters share infrastructure. Including these nodes lets Louvain find communities based on shared cards and devices—not just direct transactions.

Why undirected for HAS_CC and USED?

Sharing is bidirectional—if User A and User B both have the same card, that connection works both ways. Making these undirected ensures the relationship propagates through the shared infrastructure back to other users.

Why keep P2P directed?

Transaction direction matters. Money flows from sender to receiver, and that asymmetry can reveal fraud patterns.

Step 2: Run Louvain in Stats Mode

Before writing results, let’s preview what Louvain will find:

cypher

Preview Louvain results

CALL gds.louvain.stats('fraud-graph', {})
YIELD communityCount, communityDistribution, modularity
RETURN communityCount, communityDistribution, modularity

Interpreting Stats Results

You should see approximately:

communityCount: ~11,500 communities
modularity: ~0.98

Remember from Lesson 2: modularity above 0.4 indicates useful structure. A score of 0.98 means extremely well-defined communities—nodes within communities are far more connected to each other than to outsiders.

The communityDistribution shows size statistics (min, max, mean, percentiles).

Step 3: Run Louvain in Write Mode

Now write the community IDs back to the database:

cypher

Write Louvain community IDs to nodes

CALL gds.louvain.write('fraud-graph', {
  writeProperty: 'louvainCommunityId' // (1)
})
YIELD communityCount, modularity // (2)
RETURN communityCount, modularity

Each node receives a louvainCommunityId property — nodes in the same community share the same ID
Returns summary stats to confirm the algorithm ran successfully

What Just Happened?

Louvain analyzed the projected nodes and found natural groupings.

Each node now has a louvainCommunityId property indicating which community it belongs to.

Nodes in the same community are more densely connected to each other than to the rest of the network.

A Note on Community IDs

Your community IDs will differ from any examples shown.

Louvain is non-deterministic—the specific IDs assigned depend on processing order. What matters is the grouping, not the ID numbers.

When following along, always use the IDs from your results.

Step 4: Visualize Communities

See the community structure:

cypher

Visualize community members

MATCH path = (u:UserP2P)-[*1..2]-(n:Card|Device)
WHERE u.louvainCommunityId = n.louvainCommunityId // (1)
RETURN path
LIMIT 100

Filters to only show nodes that Louvain assigned to the same community — confirming they cluster around shared infrastructure

Click on nodes to see their louvainCommunityId. Nodes in the same visual cluster should share the same community ID.

Notice how users cluster around shared cards and devices—this is exactly the infrastructure sharing we want to detect.

Step 5: Count Fraudulent Communities

How many communities contain known fraudsters?

cypher

Count communities with and without fraud

MATCH (u:UserP2P)
WITH u.louvainCommunityId AS community,
     sum(u.fraudMoneyTransfer) AS flaggedCount // (1)
RETURN
  sum(CASE WHEN flaggedCount > 0 THEN 1 ELSE 0 END) AS communitiesWithFraud, // (2)
  sum(CASE WHEN flaggedCount = 0 THEN 1 ELSE 0 END) AS communitiesWithoutFraud

Aggregates fraud flags per community — fraudMoneyTransfer is 1 for known fraudsters, 0 otherwise
Uses conditional aggregation to split communities into those containing fraud vs. clean ones

The Power of Community Detection

You should find approximately:

~200 communities with at least one flagged fraudster
~11,500 communities with no flagged fraudsters

That’s roughly 1.7% of communities containing known fraud.

Louvain just reduced your search space by 98%.

Why This Matters

Before Louvain: 204,000 users to investigate

After Louvain: ~200 communities worth examining

The vast majority of users are in communities with no fraud flags. We can deprioritize them entirely and focus on the suspicious minority.

Step 6: Rank Communities by Fraud

Not all fraudulent communities are equal. Find the most suspicious ones:

cypher

Rank communities by fraud indicators

MATCH (u:UserP2P)
WITH u.louvainCommunityId AS community,
     count(u) AS userCount,
     sum(u.fraudMoneyTransfer) AS flaggedCount
WHERE flaggedCount > 0 // (1)
RETURN community,
       userCount,
       flaggedCount,
       round(100.0 * flaggedCount / userCount, 1) AS flaggedPercent // (2)
ORDER BY flaggedCount DESC
LIMIT 10

Filters to only communities with at least one known fraudster
Calculates the fraud concentration — a community where 50% of users are flagged is more suspicious than one where 1% are

Interpreting the Rankings

The results show:

community — The Louvain community ID
userCount — Total users in that community
flaggedCount — Known fraudsters in that community
flaggedPercent — Percentage of community that’s flagged

High flaggedCount = More known fraud (larger rings)

High flaggedPercent = More concentrated fraud (tighter rings)

Note the community ID at the top of your results—you’ll investigate it in the next steps. Remember, your ID will differ from others'.

Step 7: Set a Parameter for Investigation

Pick the top community from your results and set it as a parameter:

cypher

Set your community ID (replace with your actual ID)

:param louvainCommunityId => 179061

The :param command is specific to Neo4j Browser. If you’re using a different client, you may need to pass parameters differently.

Replace 179061 with the community ID from the top of your results.

Step 8: Examine the Community

See the breakdown of flagged vs unflagged users:

cypher

Count flagged and unflagged users in the community

MATCH (u:UserP2P)
WHERE u.louvainCommunityId = $louvainCommunityId // (1)
RETURN u.fraudMoneyTransfer AS isFlagged, // (2)
       count(*) AS userCount
ORDER BY isFlagged

Uses the parameter set in the previous step to filter to a single community
Groups by fraud flag to show how many users are flagged (1) vs unflagged (0) — unflagged users in fraud-heavy communities are our investigation targets

What This Tells Us

You should see two rows:

Users with fraudMoneyTransfer = 0 (unflagged)
Users with fraudMoneyTransfer = 1 (flagged)

The unflagged users are our investigation targets—they’re in a fraud-heavy community but haven’t been identified yet.

Are they accomplices? Victims? Mules? That’s what we need to find out.

Step 9: Visualize the Community

See how users in this community connect:

cypher

Visualize connections within the community

MATCH path = (u1:UserP2P)-[:HAS_CC|USED|P2P*1..4]-(u2:UserP2P) // (1)
WHERE u1.louvainCommunityId = $louvainCommunityId
  AND u2.louvainCommunityId = $louvainCommunityId
  AND u1 <> u2 // (2)
RETURN path
LIMIT 200

Traverses up to 4 hops through shared cards, devices, and P2P transactions to reveal the full community structure
Prevents self-matching — ensures we only see paths between distinct users

Expand nodes to explore the connections. Look for:

Flagged users (fraudMoneyTransfer = 1) clustered together
Unflagged users connected to multiple flagged users
Shared cards or devices linking suspicious accounts

These patterns suggest which unflagged users deserve closer scrutiny.

What We’ve Achieved

Stage	Scale
Starting point	~790,000 nodes, 204,000 users
After Louvain	~200 suspicious communities
Focused community	A few hundred users

Stage

Scale

Starting point

~790,000 nodes, 204,000 users

After Louvain

~200 suspicious communities

Focused community

A few hundred users

We’ve gone from an impossible manual task to a focused investigation.

The Remaining Question

We’ve found communities containing fraud. But within each community:

Which users are most suspicious?
Who should we investigate first?
How do we prioritize hundreds of potential suspects?

What’s Next

In Lesson 4, you’ll learn two algorithms that help with formal community assignment:

Degree Centrality — Identify high-connection nodes (potential hubs or noise)
Weakly Connected Components (WCC) — Deterministic community assignment for auditable results

These tools will help you move from exploration to actionable suspect lists.

Cleanup

Drop the projection:

cypher

Drop the projection

CALL gds.graph.drop('fraud-graph')

You can keep the projection if you want to experiment further. The louvainCommunityId property is already written to nodes, so the projection is no longer needed for the analysis we’ve done.

Summary

You’ve used Louvain to dramatically reduce your search space:

Created a heterogeneous projection capturing users and shared infrastructure
Found ~11,500 communities with modularity of 0.98
Identified ~200 communities (1.7%) containing known fraud
Focused on the most fraudulent community for investigation

Louvain transformed an impossible 204,000-user investigation into a manageable set of suspicious communities.

Graph Data Science in Practice

GDS Foundations

Community Detection for Fraud

Reducing Search Space with Louvain

Introduction

What You’ll Learn

Step 1: Project the Graph

Understanding the Projection

Step 2: Run Louvain in Stats Mode

Interpreting Stats Results

Step 3: Run Louvain in Write Mode

What Just Happened?

A Note on Community IDs

Step 4: Visualize Communities

Step 5: Count Fraudulent Communities

The Power of Community Detection

Why This Matters

Step 6: Rank Communities by Fraud

Interpreting the Rankings

Step 7: Set a Parameter for Investigation

Step 8: Examine the Community

What This Tells Us

Step 9: Visualize the Community

What We’ve Achieved

The Remaining Question

What’s Next

Cleanup

Summary

Chatbot

Data Model