Challenge: Aggregated projection and analysis

Introduction

You’ve learned how to aggregate relationships during projection using count(r) to count relationships between node pairs. Now it’s time to apply these skills independently.

In this challenge, you’ll design and build an aggregated projection from scratch, then analyze it with an algorithm of your choice.

Your task

Step 1: Build an aggregated projection

Create a projection called 'director-actor-collab' that captures the collaboration network between directors and actors.

Your projection should:

Connect Director nodes to Actor nodes through shared Movie nodes
Use relationship aggregation to create a weight representing the total number of movies each director-actor pair has worked on together
Project as a bipartite graph (preserving both Director and Actor labels)
Make relationships undirected

Pattern to consider:

Directors connect to movies via DIRECTED
Actors connect to movies via ACTED_IN
You need to aggregate based on shared movies

Weight:

Count how many movies each director-actor pair has collaborated on
This count should become the relationship rels property

Step 2: Validate your projection

Run gds.graph.list() on your projection and verify:

Both Director and Actor nodes are present
The relationship count is less than it would be without aggregation
The projection has a rels property

cypher

Validate director-actor-collab projection

CALL gds.graph.list('director-actor-collab')
YIELD graphName, nodeCount, relationshipCount, schema
RETURN graphName, nodeCount, relationshipCount, schema

Step 3: Analyze with an algorithm

Choose a community detection or centrality algorithm from the GDS documentation that:

Supports bipartite graphs (or can work on bipartite data)
Can use weighted relationships

Run the algorithm in stream mode and return meaningful results that show:

Which directors and actors cluster together
OR: Which directors/actors are most central in the collaboration network

Use the relationshipWeightProperty: 'rels' configuration to leverage your aggregated weights.

Hint

Projection hint:

Your aggregation pattern should look something like:

cypher

Hint: Aggregation pattern for Director-Actor projection

MATCH (source:Director)-[r:DIRECTED]->
    (:Movie)
        <-[:ACTED_IN]-(target:Actor)
WITH source, target, count(r) AS rels
// Continue with projection...

Check your understanding

Why Aggregate Relationships?

You aggregated multiple director-actor collaborations (via shared movies) into single weighted relationships.

What is the primary benefit of this aggregation?

❏ It makes the projection queries simpler to write
✓ It reduces graph size while preserving connection strength information
❏ It automatically makes all algorithms run faster
❏ It’s required for all GDS algorithms

Hint

Consider what happens to the number of relationships in your projection when you aggregate, and what information the weight property captures.

Solution

It reduces graph size while preserving connection strength information.

Without aggregation, each shared movie creates a separate relationship between a director and actor. If they worked together on 5 movies, you’d have 5 parallel relationships in your projection.

With aggregation, those 5 relationships collapse into a single relationship with rels: 5. This:

Reduces memory usage: Fewer relationships means smaller graph footprint
Improves algorithm performance: Algorithms traverse fewer edges
Preserves meaning: The rels property captures collaboration frequency, which algorithms can use

The rels property represents the strength of the director-actor relationship. Algorithms that support weighted relationships will prioritize stronger collaborations when forming communities or calculating centrality.

What Does count(r) Aggregate?

In your projection pattern:

cypher

MATCH (d:Director)-[r:DIRECTED]->(m:Movie)<-[:ACTED_IN]-(a:Actor)
WITH d, a, count(r) AS rels

What is count(r) actually counting?

❏ The number of Actor nodes
❏ The number of Director nodes
✓ The number of shared Movie nodes between each director-actor pair
❏ The total number of relationships in the database

Hint

Think about what the relationship variable r represents in the MATCH pattern, and what happens when you group by d and a.

Solution

The number of shared Movie nodes between each director-actor pair.

The r variable represents the DIRECTED relationship from director to movie. When you use WITH d, a, count(r), you’re grouping by each unique director-actor pair and counting how many movies they share.

If a director and actor collaborated on 4 movies, count(r) returns 4, representing the 4 DIRECTED relationships (one per movie) in that pattern.

Parallel Relationships Without Aggregation

If you projected the director-actor collaboration network without aggregating (just using the MATCH pattern without count(r)), what would happen?

❏ The projection would fail because GDS requires aggregation
✓ You’d have multiple parallel relationships between the same director-actor pairs
❏ GDS would automatically aggregate them for you
❏ Each director-actor pair would have exactly one relationship

Hint

Consider what happens when a director and actor collaborate on multiple movies. How many times would that pattern match?

Solution

You’d have multiple parallel relationships between the same director-actor pairs.

Without aggregation, the pattern (d:Director)-[:DIRECTED]→(m:Movie)←[:ACTED_IN]-(a:Actor) would match once for each shared movie.

If Christopher Nolan and Christian Bale worked on 3 films together, you’d project 3 separate relationships between them in the graph. GDS doesn’t automatically merge these—you need explicit aggregation with count(r) to collapse them into a single weighted relationship.

Summary

You’ve successfully designed and built an aggregated projection from scratch, choosing appropriate aggregation logic and validating your results.

Relationship aggregation is essential for real-world GDS workflows where multiple interactions between entities need to be summarized into meaningful weights that reflect connection strength, frequency, or intensity.

In the next lesson, you’ll learn about projection modeling strategies—how to design projections that answer specific analytical questions.

Get started with Graph Data Science

Get started with the Graph Data Science library

GDS basic concepts

Working with algorithms

Essential projection techniques

Challenge: Aggregated projection and analysis

Introduction

Your task

Step 1: Build an aggregated projection

Step 2: Validate your projection

Step 3: Analyze with an algorithm

Hint

Check your understanding

Why Aggregate Relationships?

What Does count(r) Aggregate?

Parallel Relationships Without Aggregation

Summary

Chatbot