Challenge: Projection modeling and analysis

Introduction

In the previous lesson, you learned how to design projections based on analytical questions. Now it’s time to apply that skill independently.

You’ll be given an analytical question, and you need to:

Design an appropriate projection
Choose a suitable algorithm
Run the analysis
Interpret the results

Your analytical question

"Which directors have the most influence in the film industry based on their actor collaborations?"

This question requires you to think about:

What does "influence" mean in this context?
What node types should appear in your projection?
What relationships capture the concept of influence?
Which algorithm can measure influence?

Your task

Step 1: Design your projection

Design a projection called 'director-influence' that captures director influence through actor collaborations.

Consider:

Directors who work with the same actors might be in similar circles
Influence could be measured by how many actors a director has worked with
Or influence could be about being a "bridge" between different actor groups

Ask yourself:

Should this be monopartite (Director → Director) or bipartite (Director → Actor)?
Do you need to infer relationships or project existing ones?
Should relationships be weighted? If so, by what?

Make sure you refer to the algorithm docs if you are uncertain.

Step 2: Build your projection

Write the Cypher projection query and verify it was created successfully using gds.graph.list().

Step 3: Choose and run an algorithm

Based on your projection design and the question about "influence," choose an appropriate algorithm from:

Degree Centrality: Counts direct connections (Degree centrality docs)
PageRank: Measures importance through connections to important nodes (PageRank docs)
Betweenness Centrality: Measures bridge positions in the network (Betweenness centrality docs)

Run your chosen algorithm in stream mode and return the top 10 directors with the most influence.

Remember to use gds.util.asNode() or gds.util.asNodeProperty() to access director names and other properties in your results.

Step 4: Validate your results

Check your top results. Do they make sense? Are well-known, prolific directors ranking highly?

Hints

Projection design hint:

Think about whether you want to measure:

Direct connections (how many actors a director has worked with)
Network position (how "central" a director is in the collaboration network)
Bridge positions (directors who connect or 'control' the flow of information between different communities)

Solution approach

Details

Step 1: Design the projection

For measuring director influence through actor collaborations, a monopartite Director-Director projection works best. Directors who share actors are connected, and we can use centrality algorithms to measure influence.

Step 2: Build the projection

cypher

Solution: Project director-director network through shared actors

MATCH (source:Person)-[:DIRECTED]->(:Movie)<-[:ACTED_IN]-(a:Actor)
      -[:ACTED_IN]->(:Movie)<-[:DIRECTED]-(target:Person)
WHERE source <> target
WITH gds.graph.project(
  'director-influence',
  source,
  target
) AS g
RETURN g.graphName AS graph, g.nodeCount AS nodes, g.relationshipCount AS rels

Projection breakdown

Match Person nodes (directors) connected through Movies with shared Actors
Filter out self-connections
Call the GDS projection function
Name the projection 'director-influence'
Include source (Person/Director) nodes
Include target (Person/Director) nodes
Return projection statistics

Key components:

The pattern Director → Movie ← Actor → Movie ← Director connects directors who worked with the same actors
This creates a monopartite director network
The more actors two directors share, the more relationships connect them

Step 3: Run PageRank

cypher

Solution: Run PageRank to measure director influence

CALL gds.pageRank.stream(
  'director-influence'
)
YIELD nodeId, score
RETURN
  gds.util.asNode(nodeId).name AS director,
  score AS influence
ORDER BY influence DESC
LIMIT 10

Algorithm breakdown

Call PageRank in stream mode
Run on 'director-influence' projection
Yield node IDs and PageRank scores
Convert node IDs to director names
Return director name and influence score
Sort by influence in descending order
Limit to top 10 directors

Alternative: Degree Centrality

You could also use Degree Centrality for a simpler measure of influence:

cypher

Alternative: Use Degree Centrality

CALL gds.degree.stream(
  'director-influence'
)
YIELD nodeId, score
RETURN
  gds.util.asNode(nodeId).name AS director,
  score AS actor_connections
ORDER BY actor_connections DESC
LIMIT 10

What the results mean:

PageRank: Directors with high scores are influential because they’re connected to other influential directors (those who work with many actors)
Degree Centrality: Directors with high scores have worked with many actors who also worked with other directors
Both metrics reveal directors who are well-connected in the Hollywood collaboration network

Check your understanding

Monopartite vs. Bipartite for Influence

If you chose to project a monopartite Director → Director graph (directors connected through shared actors), what does each relationship represent?

❏ The number of movies two directors have both directed
✓ That two directors have worked with at least one common actor
❏ That two directors have the same level of influence
❏ A direct collaboration between two directors

Hint

Think about the pattern you’re matching: (Director)-[:DIRECTED]→(:Movie)←[:ACTED_IN]-(:Actor)-[:ACTED_IN]→(:Movie)←[:DIRECTED]-(Director)

What connects the two directors in this pattern?

Solution

That two directors have worked with at least one common actor.

In a monopartite projection, you’re creating inferred relationships between directors based on shared actors. If Christopher Nolan and Ridley Scott have both worked with the same actor, they get connected in the projection—even though they never directly collaborated.

This is useful for measuring influence because directors in the same "network" of actors might have similar styles, access to similar talent pools, or be competing for the same actors.

Matching Projections to Questions

You’re asked: "Which directors tend to work in the same circles?"

Which of the following projection designs best answers this question?

❏ Bipartite: Director → Movie
✓ Monopartite: Director → Director
❏ Multipartite: Actor → Movie → Director
❏ Multipartite: Director → Movie → Actor → Genre

Hint

The question is about directors who work in similar circles. Does knowing their movies tell you anything about whether they worked with the same people? What relationship structure would capture "working in similar groups"?

Solution

Monopartite: Director → Director.

The question asks about directors working in the same circles, which means you want Director → Director relationships in your projection. You don’t need Movie or Actor nodes in the final output, but you use them to create the director connections.

Community detection algorithms (like Louvain or Leiden) would cluster directors who work with the same actors, revealing groups of directors who operate in similar creative circles or talent pools.

The bipartite option (Director → Movie) doesn’t capture the concept of "circles"—it only shows which directors worked on which films, not which directors are connected through shared talent.

Algorithm assumptions

You’ve projected a bipartite graph with Directors and Actors. You want to measure "which directors have made the most movies."

Why might Degree Centrality be a better choice than PageRank for this bipartite graph?

❏ PageRank doesn’t work on bipartite graphs at all
❏ Degree Centrality is always faster than PageRank
✓ Both could work, but degree centrality literally counts relationships directly
❏ Degree Centrality gives more accurate results in all cases

Hint

Think about what the question is asking: counting movies. What does Degree Centrality literally count in a bipartite Director → Actor graph?

Solution

Both could work, but degree centrality literally counts relationships directly.

When the question asks "which directors have made the most movies," you’re essentially asking a counting question. Degree Centrality directly counts the number of relationships each director has to actors in this projection.

While PageRank could work on a bipartite graph, it’s measuring something more abstract (importance through network position), not a direct count. If your goal is simply to count, Degree Centrality is the more straightforward choice.

In a Director → Actor bipartite graph, Degree Centrality gives you "this director worked with X actors," which is a clear proxy for productivity and film output.

Summary

You’ve completed a full GDS analysis workflow starting from an analytical question. This is how real-world GDS projects begin: with a question that drives projection design, algorithm selection, and result interpretation.

The key skill is recognizing that the same data can be projected in multiple ways, and the right projection depends on what you’re trying to measure.

In the next lesson, you’ll recap everything you’ve learned in this module.

Get started with Graph Data Science

Get started with the Graph Data Science library

GDS basic concepts

Working with algorithms

Essential projection techniques

Challenge: Projection modeling and analysis

Introduction

Your analytical question

Your task

Step 1: Design your projection

Step 2: Build your projection

Step 3: Choose and run an algorithm

Step 4: Validate your results

Hints

Solution approach

Check your understanding

Monopartite vs. Bipartite for Influence

Matching Projections to Questions

Algorithm assumptions

Summary

Chatbot