Challenge: Projection modeling and analysis

Introduction

In the previous lesson, you learned how to design projections based on analytical questions. Now it’s time to apply that skill independently.

You’ll be given an analytical question, and you need to:

  1. Design an appropriate projection

  2. Choose a suitable algorithm

  3. Run the analysis

  4. Interpret the results

Your analytical question

"Which directors have the most influence in the film industry based on their actor collaborations?"

This question requires you to think about:

  • What does "influence" mean in this context?

  • What node types should appear in your projection?

  • What relationships capture the concept of influence?

  • Which algorithm can measure influence?

Your task

Step 1: Design your projection

Design a projection called 'director-influence' that captures director influence through actor collaborations.

Consider:

  • Directors who work with the same actors might be in similar circles

  • Influence could be measured by how many actors a director has worked with

  • Or influence could be about being a "bridge" between different actor groups

Ask yourself:

  • Should this be monopartite (Director → Director) or bipartite (Director → Actor)?

  • Do you need to infer relationships or project existing ones?

  • Should relationships be weighted? If so, by what?

Make sure you refer to the algorithm docs if you are uncertain.

Step 2: Build your projection

Write the Cypher projection query and verify it was created successfully using gds.graph.list().

Step 3: Choose and run an algorithm

Based on your projection design and the question about "influence," choose an appropriate algorithm from:

Run your chosen algorithm in stream mode and return the top 10 directors with the most influence.

Remember to use gds.util.asNode() or gds.util.asNodeProperty() to access director names and other properties in your results.

Step 4: Validate your results

Check your top results. Do they make sense? Are well-known, prolific directors ranking highly?

Hints

Projection design hint:

Think about whether you want to measure:

  • Direct connections (how many actors a director has worked with)

  • Network position (how "central" a director is in the collaboration network)

  • Bridge positions (directors who connect or 'control' the flow of information between different communities)

Check your understanding

Monopartite vs. Bipartite for Influence

If you chose to project a monopartite Director → Director graph (directors connected through shared actors), what does each relationship represent?

  • ❏ The number of movies two directors have both directed

  • ✓ That two directors have worked with at least one common actor

  • ❏ That two directors have the same level of influence

  • ❏ A direct collaboration between two directors

Hint

Think about the pattern you’re matching: (Director)-[:DIRECTED]→(:Movie)←[:ACTED_IN]-(:Actor)-[:ACTED_IN]→(:Movie)←[:DIRECTED]-(Director)

What connects the two directors in this pattern?

Solution

That two directors have worked with at least one common actor.

In a monopartite projection, you’re creating inferred relationships between directors based on shared actors. If Christopher Nolan and Ridley Scott have both worked with the same actor, they get connected in the projection—even though they never directly collaborated.

This is useful for measuring influence because directors in the same "network" of actors might have similar styles, access to similar talent pools, or be competing for the same actors.

Matching Projections to Questions

You’re asked: "Which directors tend to work in the same circles?"

Which of the following projection designs best answers this question?

  • ❏ Bipartite: Director → Movie

  • ✓ Monopartite: Director → Director

  • ❏ Multipartite: Actor → Movie → Director

  • ❏ Multipartite: Director → Movie → Actor → Genre

Hint

The question is about directors who work in similar circles. Does knowing their movies tell you anything about whether they worked with the same people? What relationship structure would capture "working in similar groups"?

Solution

Monopartite: Director → Director.

The question asks about directors working in the same circles, which means you want Director → Director relationships in your projection. You don’t need Movie or Actor nodes in the final output, but you use them to create the director connections.

Community detection algorithms (like Louvain or Leiden) would cluster directors who work with the same actors, revealing groups of directors who operate in similar creative circles or talent pools.

The bipartite option (Director → Movie) doesn’t capture the concept of "circles"—it only shows which directors worked on which films, not which directors are connected through shared talent.

Algorithm assumptions

You’ve projected a bipartite graph with Directors and Actors. You want to measure "which directors have made the most movies."

Why might Degree Centrality be a better choice than PageRank for this bipartite graph?

  • ❏ PageRank doesn’t work on bipartite graphs at all

  • ❏ Degree Centrality is always faster than PageRank

  • ✓ Both could work, but degree centrality literally counts relationships directly

  • ❏ Degree Centrality gives more accurate results in all cases

Hint

Think about what the question is asking: counting movies. What does Degree Centrality literally count in a bipartite Director → Actor graph?

Solution

Both could work, but degree centrality literally counts relationships directly.

When the question asks "which directors have made the most movies," you’re essentially asking a counting question. Degree Centrality directly counts the number of relationships each director has to actors in this projection.

While PageRank could work on a bipartite graph, it’s measuring something more abstract (importance through network position), not a direct count. If your goal is simply to count, Degree Centrality is the more straightforward choice.

In a Director → Actor bipartite graph, Degree Centrality gives you "this director worked with X actors," which is a clear proxy for productivity and film output.

Summary

You’ve completed a full GDS analysis workflow starting from an analytical question. This is how real-world GDS projects begin: with a question that drives projection design, algorithm selection, and result interpretation.

The key skill is recognizing that the same data can be projected in multiple ways, and the right projection depends on what you’re trying to measure.

In the next lesson, you’ll recap everything you’ve learned in this module.

Chatbot

How can I help you today?