Louvain Community Detection

Introduction

In Lesson 1, you explored the fraud dataset and saw how fraudsters connect through transactions and shared infrastructure.

Now let’s understand the algorithm that we’ll use to find communities of connected suspects: Louvain.

We’ll practice on the familiar Movies graph first, then apply what we learn to fraud detection.

What You’ll Learn

By the end of this lesson, you’ll be able to:

  • Explain how Louvain finds communities using modularity optimization

  • Interpret modularity scores to assess community quality

  • Configure Louvain for different use cases using maxLevels and includeIntermediateCommunities

  • Recognize when Louvain is (and isn’t) the right choice

What Louvain Does

Louvain is a community detection algorithm.

It finds groups of nodes that are more densely connected to each other than to the rest of the network.

A network divided into distinct communities

The Core Concept: Modularity

Louvain optimizes modularity--a measure of community quality.

Diagram showing high vs low modularity in network connections.
  • High modularity: Dense connections within communities, sparse connections between them

  • Low modularity: Connections spread randomly across the network

Interpreting Modularity Scores

Score Interpretation

< 0.3

Weak community structure (may be noise)

0.3 - 0.5

Moderate structure (usable but noisy)

0.5 - 0.7

Good community structure

> 0.7

Strong, well-defined communities (could be suspiciously high, depending on the dataset)

In general, scores above 0.4 typically indicate meaningful groupings worth investigating.

How Louvain Works

Louvain iteratively moves nodes between communities to maximize modularity.

Left: A node in one community. Right: Same node moved to another community where it has more connections.

The algorithm asks: "Would moving this node to a neighboring community increase overall modularity?" If yes, it moves the node. This continues until no beneficial moves remain.

The Two Phases of Louvain

Louvain repeats two phases until modularity stops improving:

Phase 1: Local Optimization

Each node considers joining neighboring communities.

It joins whichever community increases modularity the most.

Phase 2: Aggregation

Once no more moves improve modularity, it collapses each community into a single "super-node" and repeats Phase 1.

Flowchart of Louvain algorithm’s two phases: Local Optimization and Aggregation.

A Metaphor: Party Guests

Imagine a party where guests naturally cluster into conversation groups.

Phase 1: Each person drifts toward the group where they know the most people.

Phase 2: Once groups stabilize, imagine each group as a single unit. These large group-units merge together based on whether more people would know each other in aggregate.

Result: A hierarchy of social clusters—​friend groups within larger social circles.

Hierarchical Communities

This two-phase process creates a hierarchy:

  • Level 1: Many small, tight-knit communities

  • Level 2: Small communities merge into medium ones

  • Level 3: Medium communities merge into larger ones

A diagram showing communities merging at successive levels.

Each level represents a different granularity of community structure. You can choose which level suits your analysis—​or access all levels at once.

Part 1: Hands-On with the Movies Graph

The Movies Dataset

Before applying Louvain to fraud, let’s practice on familiar data.

The Movies graph contains:

  • Actor and Movie nodes

  • User and Genre nodes

  • ACTED_IN, RATED, and IN_GENRE relationships

We’ll find communities of actors who frequently work together.

Project the Actor Collaboration Network

Create a projection of actors connected through shared movies:

cypher
Project actor collaborations
MATCH (source:Actor)-[r:ACTED_IN]->(m:Movie)<-[:ACTED_IN]-(target:Actor)
WHERE source <> target
WITH source, target, count(r) AS collaborations // (1)
RETURN gds.graph.project(
  'actor-collaborations',
  source,
  target,
  {relationshipProperties: {collaborations: collaborations}}, // (2)
  {undirectedRelationshipTypes: ['*']} // (3)
)
  1. Count shared movies between each actor pair

  2. Store collaboration count as a relationship property

  3. Make all relationships undirected

This creates an undirected graph where actors are connected if they’ve appeared in the same movie. The collaborations property counts how many movies they’ve shared.

Run Louvain in Stats Mode

Run this query to preview what Louvain will find:

cypher
Preview Louvain results
CALL gds.louvain.stats('actor-collaborations', {})

In this case, we’re not specifying YIELD or RETURN — this way we get to see the entire dataframe.

Interpreting Stats Results

You should get a table with similar results to this:

Field Value

modularity

0.66 (good community structure)

modularities

[0.64, 0.66, 0.66] (one per level)

ranLevels

3

communityCount

681

communityDistribution

min: 2, max: 10,604, mean: 54, p50: 8

computeMillis

~3,560

Run Louvain in Stream Mode

See which community each actor belongs to:

cypher
Stream Louvain results
CALL gds.louvain.stream('actor-collaborations', {})
YIELD nodeId, communityId
WITH communityId,
     collect(gds.util.asNode(nodeId).name) AS actors // (1)
RETURN communityId, actors[0..10], // (2)
       size(actors) AS communitySize
ORDER BY communitySize DESC
LIMIT 30
  1. Collect actor names within each community

  2. Preview the first 10 actors per community

You should notice that some groups are extremely large, and of our ~680 communities, only a few contain a large number of actors.

Visualize a Community

See how actors in a community connect through movies:

cypher
Visualize community connections
CALL gds.louvain.stream('actor-collaborations', {})
YIELD nodeId, communityId
WITH communityId,
     collect(gds.util.asNode(nodeId)) AS members
ORDER BY size(members) DESC
LIMIT 1 // (1)
WITH members
UNWIND members AS actor
MATCH path = (actor)-[:ACTED_IN]->(m:Movie)<-[:ACTED_IN]-(costar)
WHERE costar IN members // (2)
RETURN path
LIMIT 100
  1. Take the largest community

  2. Only show connections within that community

This shows the movies connecting actors within the largest community. Notice how densely connected they are—​that’s why Louvain grouped them together.

Part 2: Configuration Options

Key Configuration: maxLevels

The maxLevels parameter controls how many hierarchy levels Louvain runs.

  • Low maxLevels (1-2): Many small, specific communities

  • High maxLevels (10+): Fewer, larger communities

The default is 10, but Louvain stops early if modularity stops improving.

Experiment with maxLevels

Compare results with different maxLevels:

cypher
Louvain with maxLevels = 1
CALL gds.louvain.stats('actor-collaborations', {
  maxLevels: 1
})
YIELD communityCount, modularity
RETURN 'maxLevels: 1' AS config, communityCount, modularity
cypher
Louvain with maxLevels = 10
CALL gds.louvain.stats('actor-collaborations', {
  maxLevels: 10
})
YIELD communityCount, modularity
RETURN 'maxLevels: 10' AS config, communityCount, modularity

Notice how more levels produces fewer, larger communities. The modularity may be slightly higher with more levels, as the algorithm has more opportunities to optimize.

Choosing maxLevels

Use fewer levels when:

  • You need granular, specific groups

  • You’re doing detailed investigation of tight clusters

  • Your communities are naturally small

Use more levels when:

  • You want to cast a wide net

  • You’re looking for large-scale structure

  • Many group members are unknown (like in fraud detection)

Alternative: includeIntermediateCommunities

Instead of guessing maxLevels, set includeIntermediateCommunities: true.

This stores community IDs at every level:

cypher
Include intermediate communities
CALL gds.louvain.stream('actor-collaborations', {
  includeIntermediateCommunities: true // (1)
})
YIELD nodeId, communityId, intermediateCommunityIds // (2)
WITH gds.util.asNode(nodeId) AS actor,
     communityId, intermediateCommunityIds
WITH actor.name AS name,
       intermediateCommunityIds[0] AS level1, // (3)
       intermediateCommunityIds[1] AS level2,
       communityId AS final
WHERE level1 <> level2 AND level2 <> final
RETURN name, level1, level2, final
ORDER BY name
LIMIT 20
  1. Enable tracking of all hierarchy levels

  2. Each node yields its community at every level

  3. Access individual levels by index

You should notice how the community members get moved into new communities with each iteration.

Examine the results

The resulting table should look something like this:

name level1 level2 final

John Clayton

1

8556

26210

Tasma Walton

1

8556

26210

Chris Haywood

1

8556

26210

Mikko Nousiainen

33774

21164

13142

Adam MacDonald

19068

21164

13142

Tuomas Uusitalo

33774

21164

13142

…​

…​

…​

…​

Check intermediateCommunity sizes

Run the following query to see the increasing sizes of the communities at each level:

cypher
Community consolidation across levels
CALL gds.louvain.stream('actor-collaborations', {
  includeIntermediateCommunities: true
})
YIELD nodeId, intermediateCommunityIds, communityId
WITH intermediateCommunityIds + [communityId] AS allLevels // (1)
UNWIND range(0, size(allLevels) - 1) AS levelIndex // (2)
WITH levelIndex + 1 AS level, allLevels[levelIndex] AS communityId
WITH level, communityId, count(*) AS communitySize
RETURN level,
       count(*) AS communityCount,
       avg(communitySize) AS avgSize,
       min(communitySize) AS minSize,
       max(communitySize) AS maxSize
ORDER BY level
  1. Combine all levels into a single list

  2. Unwind to analyze each level separately

Community consolidation

Your results should look something like this table:

level communityCount avgSize minSize maxSize

1

1578

23

2

9811

2

694

53

2

10791

3

680

54

2

10791

4

680

54

2

10791

You should notice how the communities become larger with each new level.

Weighted Relationships

Louvain can use relationship weights to influence community assignment.

Diagram comparing weighted vs unweighted network relationships.

Let’s see this in action by running Louvain twice—​once unweighted, once weighted—​and comparing the results.

Run Unweighted Louvain

First, run Louvain without weights:

cypher
Unweighted Louvain
CALL gds.louvain.write('actor-collaborations', {
  writeProperty: 'communityUnweighted' // (1)
})
YIELD communityCount, modularity
RETURN 'Unweighted' AS config, communityCount, modularity
  1. Store community IDs as a node property

Run Weighted Louvain

Now run Louvain using collaboration counts as weights:

cypher
Weighted Louvain
CALL gds.louvain.write('actor-collaborations', {
  writeProperty: 'communityWeighted',
  relationshipWeightProperty: 'collaborations' // (1)
})
YIELD communityCount, modularity
RETURN 'Weighted' AS config, communityCount, modularity
  1. Use collaboration count as edge weight—​stronger connections pull harder

Compare the community counts and modularity scores. Weighting by collaboration strength often produces different groupings—​actors with many shared movies pull harder on each other.

Find Actors Split by Weighting

Find actors who were together in the unweighted run but split apart when weights were applied:

cypher
Together unweighted, split when weighted
MATCH (source:Actor)-[r:ACTED_IN]->(m:Movie)<-[:ACTED_IN]-(target:Actor)
WHERE source.communityUnweighted = target.communityUnweighted // (1)
  AND source.communityWeighted <> target.communityWeighted // (2)
  AND source < target
WITH source, target, count(m) AS sharedMovies
ORDER BY sharedMovies ASC
LIMIT 1
MATCH path = (source)-[:ACTED_IN]->(m:Movie)<-[:ACTED_IN]-(target)
RETURN path
  1. Same community when unweighted

  2. Different communities when weighted—​their connection wasn’t strong enough

These actors were grouped together based on network structure alone, but when we accounted for collaboration strength, their weak connection wasn’t enough to keep them together.

The wider network

Here they are in their wider network:

cypher
Together unweighted, split when weighted
MATCH (source:Actor)-[r:ACTED_IN]->(m:Movie)<-[:ACTED_IN]-(target:Actor)
WHERE source.communityUnweighted = target.communityUnweighted
  AND source.communityWeighted <> target.communityWeighted
  AND source < target
WITH source, target, count(m) AS sharedMovies
ORDER BY sharedMovies ASC
LIMIT 1
MATCH path = (source)-[:ACTED_IN]->(m:Movie)<-[:ACTED_IN]-(target)
MATCH path2 = (source)-[]-()-[]-()
MATCH path3 = (target)-[]-()-[]-()
RETURN path, path2, path3

See if you can spot them. They are actually quite far apart from each other.

Find Actors Joined by Weighting

Find actors who were in different communities unweighted, but joined together when weights were applied:

cypher
Split unweighted, together when weighted
MATCH (source:Actor)-[r:ACTED_IN]->(m:Movie)<-[:ACTED_IN]-(target:Actor)
WHERE source.communityUnweighted <> target.communityUnweighted // (1)
  AND source.communityWeighted = target.communityWeighted // (2)
  AND source < target
WITH source, target, count(m) AS sharedMovies
ORDER BY sharedMovies DESC
LIMIT 1
MATCH path = (source)-[:ACTED_IN]->(m:Movie)<-[:ACTED_IN]-(target)
RETURN path
  1. Different communities when unweighted

  2. Same community when weighted—​their strong collaboration pulled them together

These actors were in separate communities based on structure alone, but their strong collaboration history pulled them into the same community when weights were considered.

You should see that they have collaborated on many movies together.

The wider network

Here they are in their wider network:

cypher
Split unweighted, together when weighted
MATCH (source:Actor)-[r:ACTED_IN]->(m:Movie)<-[:ACTED_IN]-(target:Actor)
WHERE source.communityUnweighted <> target.communityUnweighted
  AND source.communityWeighted = target.communityWeighted
  AND source < target
WITH source, target, count(m) AS sharedMovies
ORDER BY sharedMovies DESC
LIMIT 1
MATCH path = (source)-[:ACTED_IN]->(m:Movie)<-[:ACTED_IN]-(target)
MATCH path2 = (source)-[]-()-[]-()
MATCH path3 = (target)-[]-()-[]-()
RETURN path, path2, path3

When we visualize them in a force directed graph, they should appear relatively close together — if not side-by-side.

What This Demonstrates

  • Unweighted: Community assignment based purely on connection structure

  • Weighted: Stronger connections have more influence on grouping

Actors with few shared movies may be split apart when weights are applied. Actors with many shared movies may be pulled together despite structural separation.

For fraud detection, weighting by transaction amounts or frequency can help distinguish casual connections from meaningful relationships.

The tolerance parameter

Louvain stops when modularity improvements become negligible.

The tolerance parameter controls "negligible":

cypher
Adjusting tolerance
CALL gds.louvain.stats('actor-collaborations', {
  tolerance: 0.00001
})
YIELD communityCount, modularity
RETURN communityCount, modularity
  • Lower tolerance: More iterations, potentially better modularity, slower

  • Higher tolerance: Fewer iterations, faster, may stop early

Default (0.0001) works well for most cases.

High tolerance

Let’s see what happens if we up the tolerance to 1.0:

cypher
Adjusting tolerance
CALL gds.louvain.stats('actor-collaborations', {
  tolerance: 1.0
})
YIELD communityCount, modularity
RETURN communityCount, modularity

Our overall modularity has gone down while our community count has risen.

This happens because the algorithm considers itself 'converged' once the modularity stops increasing more than 1.0.

Lower tolerance

You can also lower the tolerance past 0.00001. In our case, the graph converges fairly quickly, so the difference is not huge.

However, let’s run it anyway and see what we get:

cypher
Adjusting tolerance
CALL gds.louvain.stats('actor-collaborations', {
  tolerance: 1e-9
})
YIELD communityCount, modularity
RETURN communityCount, modularity

It’s worth noting here that we can write floats in GDS using scientific notation.

Clean Up

Drop the projection:

cypher
Drop the projection
CALL gds.graph.drop('actor-collaborations')

Part 3: When to Use Louvain

When Louvain Works Well

Louvain is ideal when:

  • You need fast results on large networks (millions of nodes)

  • You want to explore community structure broadly

  • Communities have varying sizes

  • You’re doing initial investigation, not final assignment

Limitations

Louvain has some important limitations:

Resolution limit: May miss very small communities in large networks. If you need to find 3-person fraud cells in a million-node graph, Louvain might merge them into larger groups.

Non-deterministic: Results can vary slightly between runs due to node processing order. Community IDs will differ; community membership is usually stable.

These limitations don’t make Louvain wrong for fraud detection—​they make it a tool for exploration, not final judgment. In Lesson 5, you’ll learn how to use WCC for deterministic, explainable community assignment.

Transfer: From Movies to Fraud

You’ve practiced Louvain on actor collaborations. Now let’s apply it:

Movies Concept Fraud Equivalent

Actor nodes

User nodes

Shared movies (collaborations)

Shared identifiers (cards, devices)

Finding acting ensembles

Finding fraud rings

Community = frequent collaborators

Community = potentially coordinated actors

Summary

Louvain finds communities by optimizing modularity through iterative local optimization and aggregation.

Key points:

  • Modularity scores above 0.4 indicate useful community structure

  • maxLevels controls granularity; use includeIntermediateCommunities for flexibility

  • relationshipWeightProperty lets stronger connections influence grouping

  • tolerance controls convergence sensitivity

  • Fast and effective but non-deterministic

In the next lesson, you’ll run Louvain on the fraud network and reduce your search space by 98%.

Chatbot

How can I help you today?