Community Detection

Introduction

Community detection algorithms are used to evaluate how groups of nodes may be clustered or partitioned in the graph. Much of the community detection functionality in GDS is focused on distinguishing and assigning ids to these node groups for downstream analytics, visualization, or other processing.

Common use cases of community detection include:

  • Fraud detection: Finding fraud rings by identifying accounts that have frequent suspicious transactions and/or share identifiers between one another.

  • Customer 360: Disambiguating multiple records and interactions into a single customer profile so an organization has an aggregated source of truth for each customer.

  • Market segmentation: dividing a target market into approachable subgroups based on priorities, behaviors, interests, and other criteria.

Louvain Community Detection

louvain modularity
Learn all you need to know about Graph Algorithms and Machine Learning PipelinesLouvain Modularity Optimization

A common community detection algorithm is Louvain. Louvain maximizes a modularity score for each community, where the modularity quantifies the quality of an assignment of nodes to communities. This means evaluating how much more densely connected the nodes within a community are, compared to how connected they would be in a random network.

Louvain optimizes this modularity with a hierarchical clustering approach that recursively merges communities together. There are multiple parameters that can be used to tune Louvain to control its performance and the number and size of communities produced. This includes the maximum number of iterations and hierarchical levels to use as well as the tolerance parameter for assessing convergence/stopping conditions. Our Louvain documentation covers these parameters and tuning in more detail.

An additional important consideration is that Louvain is a stochastic algorithm. As such, the community assignments may change a bit when re-run. When the graph does not have a naturally well-defined community structure the changes between runs can become more significant. Louvain includes a seedProperty parameter which can be used to assign initial community ids and help with consistency between runs. Also, if consistency is important for your use case, other community detection algorithms, such as Weakly Connected Components (WCC), take a more deterministic partitioning approach to assigning communities and thus will not change between runs.


Below is an example of running Louvain to understand communities of actors and directors in our movies recommendations graph.

First create a graph projection with movies, actors, and directors. Project the relationships with an UNDIRECTED orientation as that works best with the Louvain algorithm.

cypher
CALL gds.graph.project('proj', ['Movie', 'Person'], {
    ACTED_IN:{orientation:'UNDIRECTED'},
    DIRECTED:{orientation:'UNDIRECTED'}
});

Then we can run Louvain. Here we will run Louvain in mutate mode to save community Ids and return high level statistics on the community counts, distribution, modularity score, and information for how Louvain processed the graph.

cypher
CALL gds.louvain.mutate('proj', {mutateProperty:'communityId'})

We can verify the communityId node properties in the projection with a stream operation.

cypher
CALL gds.graph.nodeProperty.stream('proj','communityId', ['Person'])
YIELD nodeId, propertyValue
WITH gds.util.asNode(nodeId) AS n, propertyValue AS communityId
WHERE n:Person
RETURN n.name, communityId LIMIT 10

Other Community Detection Algorithms

Below are some of the other production tier community detection algorithms. A full list of all community detection algorithms can be found in the Community Detection algorithms documentation.

  • Label Propagation: Similar intent as Louvain. Fast algorithm that parallelizes well. Great for large graphs.

  • Weakly Connected Components (WCC): Partitions the graph into sets of connected nodes such that

    1. Every node is reachable from any other node in the same set

    2. No path exists between nodes from different sets

  • Triangle Count: Counts the number of triangles for each node. Can be used to detect the cohesiveness of communities and stability of the graph.

  • Local Clustering Coefficient: Computes the local clustering coefficient for each node in the graph which is an indicator for how the node clusters with its neighbors.

Check your understanding

1. Algorithm Purpose

What do community detection algorithms primarily accomplish? Select the best answer.

  • ✓ Detect how groups of nodes cluster together to form communities and strength of such clustering in the graph

  • ❏ Find shortest paths between multiple nodes to identify communities

  • ❏ Identify similar node pairs across the graph to define communities

  • ❏ Compute low-dimensional representation of nodes in the graph that can be used downstream to detect communities of nodes in the graph

Hint

A cluster of common nodes could also be described as a community.

Solution

Community detection algorithms detect how groups of nodes cluster together to form communities and strength of such clustering in the graph.

2. Name the Algorithm

Which of the below community detection algorithms is focused on deterministically partitioning the graph into sets of disjoint nodes such that no path exists between sets?

  • ❏ Louvain

  • ❏ Label Propagation

  • ❏ Triangle Counting

  • ✓ Weakly Connected Components

Hint

The algorithm aims to identify nodes with a lower number of connections.

Solution

The answer is Weakly Connected Components. You can learn more about Weakly Connected Components in the GDS Documentation.

Summary

In this lesson we covered community detection algorithms in GDS and how to apply them to actual data.

In the next lesson, you will learn more about Similarity algorithms.