Introduction
You’ve learned how to aggregate relationships during projection using count(r) to count relationships between node pairs. Now it’s time to apply these skills independently.
In this challenge, you’ll design and build an aggregated projection from scratch, then analyze it with an algorithm of your choice.
Your task
Step 1: Build an aggregated projection
Create a projection called 'director-actor-collab' that captures the collaboration network between directors and actors.
Your projection should:
-
Connect
Directornodes toActornodes through sharedMovienodes -
Use relationship aggregation to create a weight representing the total number of movies each director-actor pair has worked on together
-
Project as a bipartite graph (preserving both Director and Actor labels)
-
Make relationships undirected
Pattern to consider:
-
Directors connect to movies via
DIRECTED -
Actors connect to movies via
ACTED_IN -
You need to aggregate based on shared movies
Weight:
-
Count how many movies each director-actor pair has collaborated on
-
This count should become the relationship
relsproperty
Step 2: Validate your projection
Run gds.graph.list() on your projection and verify:
-
Both
DirectorandActornodes are present -
The relationship count is less than it would be without aggregation
-
The projection has a
relsproperty
CALL gds.graph.list('director-actor-collab')
YIELD graphName, nodeCount, relationshipCount, schema
RETURN graphName, nodeCount, relationshipCount, schemaStep 3: Analyze with an algorithm
Choose a community detection or centrality algorithm from the GDS documentation that:
-
Supports bipartite graphs (or can work on bipartite data)
-
Can use weighted relationships
Run the algorithm in stream mode and return meaningful results that show:
-
Which directors and actors cluster together
-
OR: Which directors/actors are most central in the collaboration network
Use the relationshipWeightProperty: 'rels' configuration to leverage your aggregated weights.
Hint
Projection hint:
Your aggregation pattern should look something like:
MATCH (source:Director)-[r:DIRECTED]->
(:Movie)
<-[:ACTED_IN]-(target:Actor)
WITH source, target, count(r) AS rels
// Continue with projection...Check your understanding
Why Aggregate Relationships?
You aggregated multiple director-actor collaborations (via shared movies) into single weighted relationships.
What is the primary benefit of this aggregation?
-
❏ It makes the projection queries simpler to write
-
✓ It reduces graph size while preserving connection strength information
-
❏ It automatically makes all algorithms run faster
-
❏ It’s required for all GDS algorithms
Hint
Consider what happens to the number of relationships in your projection when you aggregate, and what information the weight property captures.
Solution
It reduces graph size while preserving connection strength information.
Without aggregation, each shared movie creates a separate relationship between a director and actor. If they worked together on 5 movies, you’d have 5 parallel relationships in your projection.
With aggregation, those 5 relationships collapse into a single relationship with rels: 5. This:
-
Reduces memory usage: Fewer relationships means smaller graph footprint
-
Improves algorithm performance: Algorithms traverse fewer edges
-
Preserves meaning: The
relsproperty captures collaboration frequency, which algorithms can use
The rels property represents the strength of the director-actor relationship. Algorithms that support weighted relationships will prioritize stronger collaborations when forming communities or calculating centrality.
What Does count(r) Aggregate?
In your projection pattern:
MATCH (d:Director)-[r:DIRECTED]->(m:Movie)<-[:ACTED_IN]-(a:Actor)
WITH d, a, count(r) AS relsWhat is count(r) actually counting?
-
❏ The number of Actor nodes
-
❏ The number of Director nodes
-
✓ The number of shared Movie nodes between each director-actor pair
-
❏ The total number of relationships in the database
Hint
Think about what the relationship variable r represents in the MATCH pattern, and what happens when you group by d and a.
Solution
The number of shared Movie nodes between each director-actor pair.
The r variable represents the DIRECTED relationship from director to movie. When you use WITH d, a, count(r), you’re grouping by each unique director-actor pair and counting how many movies they share.
If a director and actor collaborated on 4 movies, count(r) returns 4, representing the 4 DIRECTED relationships (one per movie) in that pattern.
Parallel Relationships Without Aggregation
If you projected the director-actor collaboration network without aggregating (just using the MATCH pattern without count(r)), what would happen?
-
❏ The projection would fail because GDS requires aggregation
-
✓ You’d have multiple parallel relationships between the same director-actor pairs
-
❏ GDS would automatically aggregate them for you
-
❏ Each director-actor pair would have exactly one relationship
Hint
Consider what happens when a director and actor collaborate on multiple movies. How many times would that pattern match?
Solution
You’d have multiple parallel relationships between the same director-actor pairs.
Without aggregation, the pattern (d:Director)-[:DIRECTED]→(m:Movie)←[:ACTED_IN]-(a:Actor) would match once for each shared movie.
If Christopher Nolan and Christian Bale worked on 3 films together, you’d project 3 separate relationships between them in the graph. GDS doesn’t automatically merge these—you need explicit aggregation with count(r) to collapse them into a single weighted relationship.
Summary
You’ve successfully designed and built an aggregated projection from scratch, choosing appropriate aggregation logic and validating your results.
Relationship aggregation is essential for real-world GDS workflows where multiple interactions between entities need to be summarized into meaningful weights that reflect connection strength, frequency, or intensity.
In the next lesson, you’ll learn about projection modeling strategies—how to design projections that answer specific analytical questions.