Introduction
In the previous lesson, you learned how to design projections based on analytical questions. Now it’s time to apply that skill independently.
You’ll be given an analytical question, and you need to:
-
Design an appropriate projection
-
Choose a suitable algorithm
-
Run the analysis
-
Interpret the results
Your analytical question
"Which directors have the most influence in the film industry based on their actor collaborations?"
This question requires you to think about:
-
What does "influence" mean in this context?
-
What node types should appear in your projection?
-
What relationships capture the concept of influence?
-
Which algorithm can measure influence?
Your task
Step 1: Design your projection
Design a projection called 'director-influence' that captures director influence through actor collaborations.
Consider:
-
Directors who work with the same actors might be in similar circles
-
Influence could be measured by how many actors a director has worked with
-
Or influence could be about being a "bridge" between different actor groups
Ask yourself:
-
Should this be monopartite (Director → Director) or bipartite (Director → Actor)?
-
Do you need to infer relationships or project existing ones?
-
Should relationships be weighted? If so, by what?
Make sure you refer to the algorithm docs if you are uncertain.
Step 2: Build your projection
Write the Cypher projection query and verify it was created successfully using gds.graph.list().
Step 3: Choose and run an algorithm
Based on your projection design and the question about "influence," choose an appropriate algorithm from:
-
Degree Centrality: Counts direct connections (Degree centrality docs)
-
PageRank: Measures importance through connections to important nodes (PageRank docs)
-
Betweenness Centrality: Measures bridge positions in the network (Betweenness centrality docs)
Run your chosen algorithm in stream mode and return the top 10 directors with the most influence.
Remember to use gds.util.asNode() or gds.util.asNodeProperty() to access director names and other properties in your results.
Step 4: Validate your results
Check your top results. Do they make sense? Are well-known, prolific directors ranking highly?
Hints
Projection design hint:
Think about whether you want to measure:
-
Direct connections (how many actors a director has worked with)
-
Network position (how "central" a director is in the collaboration network)
-
Bridge positions (directors who connect or 'control' the flow of information between different communities)
Check your understanding
Monopartite vs. Bipartite for Influence
If you chose to project a monopartite Director → Director graph (directors connected through shared actors), what does each relationship represent?
-
❏ The number of movies two directors have both directed
-
✓ That two directors have worked with at least one common actor
-
❏ That two directors have the same level of influence
-
❏ A direct collaboration between two directors
Hint
Think about the pattern you’re matching: (Director)-[:DIRECTED]→(:Movie)←[:ACTED_IN]-(:Actor)-[:ACTED_IN]→(:Movie)←[:DIRECTED]-(Director)
What connects the two directors in this pattern?
Solution
That two directors have worked with at least one common actor.
In a monopartite projection, you’re creating inferred relationships between directors based on shared actors. If Christopher Nolan and Ridley Scott have both worked with the same actor, they get connected in the projection—even though they never directly collaborated.
This is useful for measuring influence because directors in the same "network" of actors might have similar styles, access to similar talent pools, or be competing for the same actors.
Matching Projections to Questions
You’re asked: "Which directors tend to work in the same circles?"
Which of the following projection designs best answers this question?
-
❏ Bipartite: Director → Movie
-
✓ Monopartite: Director → Director
-
❏ Multipartite: Actor → Movie → Director
-
❏ Multipartite: Director → Movie → Actor → Genre
Hint
The question is about directors who work in similar circles. Does knowing their movies tell you anything about whether they worked with the same people? What relationship structure would capture "working in similar groups"?
Solution
Monopartite: Director → Director.
The question asks about directors working in the same circles, which means you want Director → Director relationships in your projection. You don’t need Movie or Actor nodes in the final output, but you use them to create the director connections.
Community detection algorithms (like Louvain or Leiden) would cluster directors who work with the same actors, revealing groups of directors who operate in similar creative circles or talent pools.
The bipartite option (Director → Movie) doesn’t capture the concept of "circles"—it only shows which directors worked on which films, not which directors are connected through shared talent.
Algorithm assumptions
You’ve projected a bipartite graph with Directors and Actors. You want to measure "which directors have made the most movies."
Why might Degree Centrality be a better choice than PageRank for this bipartite graph?
-
❏ PageRank doesn’t work on bipartite graphs at all
-
❏ Degree Centrality is always faster than PageRank
-
✓ Both could work, but degree centrality literally counts relationships directly
-
❏ Degree Centrality gives more accurate results in all cases
Hint
Think about what the question is asking: counting movies. What does Degree Centrality literally count in a bipartite Director → Actor graph?
Solution
Both could work, but degree centrality literally counts relationships directly.
When the question asks "which directors have made the most movies," you’re essentially asking a counting question. Degree Centrality directly counts the number of relationships each director has to actors in this projection.
While PageRank could work on a bipartite graph, it’s measuring something more abstract (importance through network position), not a direct count. If your goal is simply to count, Degree Centrality is the more straightforward choice.
In a Director → Actor bipartite graph, Degree Centrality gives you "this director worked with X actors," which is a clear proxy for productivity and film output.
Summary
You’ve completed a full GDS analysis workflow starting from an analytical question. This is how real-world GDS projects begin: with a question that drives projection design, algorithm selection, and result interpretation.
The key skill is recognizing that the same data can be projected in multiple ways, and the right projection depends on what you’re trying to measure.
In the next lesson, you’ll recap everything you’ve learned in this module.