Challenge: Weighted graph projection and analysis

Introduction

You’ve learned how to project graphs with undirected relationships and weights. You’ve also explored several community detection algorithms and learned how to read the GDS documentation.

Now it’s time to put those skills together by applying Label Propagation—a community detection algorithm you haven’t used yet—to a weighted, undirected, bipartite graph.

In this challenge, you’ll:

  1. Project an actor-movie bipartite graph with undirected relationships and IMDb ratings as weights

  2. Use Label Propagation to detect communities in the weighted graph

  3. Analyze the results

Your task

Step 1: Project the graph

Project a graph called 'actor-movie-quality' that captures the actor-movie network weighted by IMDb film ratings.

Your projection should:

  • Include Actor and Movie nodes connected via ACTED_IN

  • Use the imdbRating property as the relationship weight

  • Make relationships undirected

You’ve done this before—write the complete projection query.

Verify your projection was created successfully using gds.graph.list().

Step 2: Review label propagation

Visit the Label Propagation documentation.

Understand what configuration parameters are available and how weights affect the algorithm.

Step 3: Run label propagation

Run Label Propagation on your projected graph in stream mode using IMDb ratings as weights.

Return the top 10 communities by size with sample member names.

When you’ve completed your analysis, answer the questions below.

Check your understanding

What Does Label Propagation Do?

What is the primary purpose of Label Propagation?

  • ❏ It assigns importance scores to nodes based on their connections

  • ✓ It detects communities by having nodes adopt the most common label among their neighbors

  • ❏ It finds the shortest path between two nodes

  • ❏ It calculates centrality scores for all nodes

Hint

Think about what "propagating labels" means—how do nodes decide which community they belong to?

Solution

Label Propagation detects communities by having nodes adopt the most common label among their neighbors.

The algorithm works iteratively: 1. Each node starts with its own unique label (community ID) 2. In each iteration, nodes adopt the label that appears most frequently among their neighbors 3. This continues until labels stabilize or a maximum iteration count is reached

Labels "propagate" through the network as neighbors influence each other, naturally forming communities of nodes that are densely connected.

Label Propagation vs. Louvain/Leiden

How is Label Propagation fundamentally different from Louvain and Leiden?

  • ✓ Label Propagation uses local neighbor voting, while Louvain/Leiden optimize modularity globally

  • ❏ Label Propagation only works on weighted graphs, while Louvain/Leiden work on unweighted graphs

  • ❏ Label Propagation is slower than Louvain/Leiden

  • ❏ Label Propagation can only detect overlapping communities

Hint

Consider what metric each algorithm uses to decide community membership—voting vs. optimization.

Solution

Label Propagation uses local neighbor voting, while Louvain/Leiden optimize modularity globally.

Label Propagation: - Uses a simple voting mechanism: each node adopts the most common label among its neighbors - Makes local decisions based only on immediate connections - Fast and simple, but results can be sensitive to initialization

Louvain/Leiden: - Optimize modularity—a global metric measuring how well-defined communities are - Consider the entire network structure when making decisions - Hierarchical: they aggregate communities into super-nodes and repeat - Generally more stable and produce higher-quality communities

Both detect communities, but through fundamentally different approaches—local voting vs. global optimization.

What Happens When You Rerun Label Propagation?

What do you notice if you run Label Propagation multiple times in a row on the same graph?

  • ❏ You get exactly the same results every time

  • ❏ The algorithm fails after the first run

  • ✓ You get different community assignments each time

  • ❏ The number of communities always increases with each run

Hint

Try running the same Label Propagation query 3-4 times. Look at the communityId values and community sizes. Do they stay the same?

Solution

You get different community assignments each time.

Label Propagation is non-deterministic—it can produce different results on successive runs. You might notice:

  • Different communityId values for the same nodes

  • Different numbers of communities detected

  • Different community sizes

This happens because Label Propagation is sensitive to: 1. Initialization order: Which nodes update their labels first can affect the final outcome 2. Tie-breaking: When a node’s neighbors have equal votes for multiple labels, the algorithm must break ties randomly 3. Update sequence: The order in which nodes process their neighbors' labels

This is normal behavior for Label Propagation and one of its trade-offs for speed and simplicity.

Why Does Label Propagation Produce Different Results?

Why does Label Propagation produce different results each time you run it?

  • ❏ The algorithm uses random initialization and has no convergence criteria

  • ✓ The algorithm’s reliance on neighbor voting and tie-breaking introduces randomness

  • ❏ The graph structure changes between runs

  • ❏ GDS has a bug in the Label Propagation implementation

Hint

Think about what happens when a node’s neighbors are evenly split between two community labels. How does the algorithm decide which label to adopt?

Solution

The algorithm’s reliance on neighbor voting and tie-breaking introduces randomness.

Label Propagation is non-deterministic because:

Tie-breaking: When a node’s neighbors have equal votes for multiple labels (e.g., 3 neighbors in community A, 3 in community B), the algorithm must break the tie. This tie-breaking is random, leading to different outcomes.

Update order: The sequence in which nodes update their labels can affect the final result. If node X updates before node Y, it might influence Y’s decision differently than if Y updated first.

Cascading effects: A single tie-break early in the algorithm can cascade through the network, causing different nodes to join different communities as the labels propagate.

This is a known trade-off: Label Propagation is fast and simple, but less stable than modularity-optimizing algorithms like Louvain or Leiden, which are deterministic (same input = same output).

How Do Weights Affect Label Propagation?

How do relationship weights influence Label Propagation’s community detection?

  • ❏ Weights have no effect on Label Propagation

  • ❏ Weights only affect the algorithm’s performance, not the results

  • ✓ Weights make stronger connections more influential in the neighbor voting process

  • ❏ Weights determine the maximum number of communities that can be detected

Hint

Consider how a node with highly-rated film connections (imdbRating: 9.5) should be influenced compared to connections with poorly-rated films (imdbRating: 3.2).

Solution

Weights make stronger connections more influential in the neighbor voting process.

In weighted Label Propagation:

Unweighted behavior: Each neighbor gets one "vote" for its community label, regardless of connection strength.

Weighted behavior: Each neighbor’s vote is weighted by the relationship property. A neighbor connected by a high-weight relationship (imdbRating: 9.2) has more influence than a neighbor connected by a low-weight relationship (imdbRating: 4.1).

In your actor-movie network: - Actors and movies connected through highly-rated films have stronger influence on each other’s community assignment - Communities tend to form around quality collaborations rather than just any collaboration - This produces more meaningful clusters based on film success, not just co-appearance

Weights essentially amplify or diminish each neighbor’s influence in the voting process, allowing the algorithm to prioritize important connections over weak ones.

Summary

You’ve successfully projected a weighted, undirected bipartite graph and applied Label Propagation to detect communities based on IMDb film ratings.

This challenge demonstrates a core GDS workflow: understanding your data model, designing the right projection, and applying algorithms that leverage weighted relationships to reveal meaningful patterns.

In the next lesson, you’ll learn about relationship aggregation—how to create weights by combining multiple relationships during projection.

Chatbot

How can I help you today?