Introduction
You’ve learned how to create projections and how to configure algorithms. Now it’s time to connect these skills: modeling projections specifically for the questions you want your algorithms to answer.
In Module 2, you discovered that the same data can be projected in different ways—monopartite, bipartite, or multipartite. Each projection reveals different patterns, and the right projection depends on your analytical question.
In this lesson, you’ll learn how to design projections that work effectively with specific algorithms to answer specific questions.
By the end of this lesson, you will understand:
-
How projection structure affects algorithm results
-
How to match projections to algorithm types
-
How to design projections for specific analytical questions
-
When to transform your data model before projecting
The projection-algorithm relationship
Every algorithm makes assumptions about graph structure. Choosing the right projection is about matching your data structure to these assumptions.
Centrality algorithms (PageRank, Betweenness, Degree) work best on:
-
Networks where relationships indicate influence, flow, or importance
-
Monopartite graphs (same node type on both sides)
-
Example: Actor → Actor (collaboration network)
Community detection (Louvain, Leiden) works best on:
-
Networks where clusters are meaningful
-
Monopartite graphs with clear community boundaries
-
Example: Person → Person (social networks) or Person → CreditCard (financial transaction networks)
Similarity (Node Similarity) works best on:
-
Bipartite graphs where nodes connect through intermediate entities
-
Example: User → Movie, Actor → Movie
Pathfinding (Dijkstra, Yen’s) is best for:
-
Networks with weighted relationships representing cost or distance
-
Clear source and target nodes
-
Example: JFK → Heathrow → Charles De Gaulle
Question-Driven Projection Design
As with any data modeling task, it can be helpful to start with your question and then design the projection to answer it.
Let’s take a look at some examples:
Question: "Which actors are most influential in Hollywood?"
Projection strategy:
-
Node types in answer: Actor (monopartite)
-
Relationships in answer: Actor → Actor (requires inferred relationship projection or graph refactor)
-
Algorithm: PageRank, Degree Centrality or Betweenness could all frame the same answer in slightly different ways.
-
Why: Influence flows through collaborations
MATCH (source:Actor)-[:ACTED_IN]->
(:Movie)
<-[:ACTED_IN]-(target:Actor)
WITH gds.graph.project('actor-influence', source, target) AS g
RETURN g.graphName, g.nodeCount, g.relationshipCountQuestion: "Which users have similar movie tastes?"
Projection strategy:
-
Node types in answer: Users
-
Relationships: User rates Movie
-
Algorithm: We don’t need movies in the output, but we do need them in the input. So a bipartite graph using the Node Similarity algorithm would work.
-
Why: Similarity through shared ratings
MATCH (source:User)-[r:RATED]->(target:Movie)
WITH gds.graph.project(
'user-similarity',
source,
target,
{
sourceNodeLabels: labels(source),
targetNodeLabels: labels(target)
},
{}
) AS g
RETURN g.graphName, g.nodeCount, g.relationshipCountYou could reframe this question and come to a different conclusion.
Question: "Which movies do users most like?"
-
Node types in answer: Users and Movies
-
Relationships: User rates Movie
-
Algorithm: Leiden (among others) would help you to cluster users into communities with shared movies they all enjoy. Then, you could identify which movies from that community a user has not yet seen and recommend it.
-
Why: Similarity through shared ratings. Any recommendations would then be entirely based on user behaviors rather than any sort of synthetic, subjective classifications.
What’s Next
You now understand how to design projections that match algorithm assumptions and answer specific analytical questions.
In the next lesson, you’ll put all these skills together in a comprehensive challenge that requires you to project, configure, and run algorithms independently.
Check your understanding
Question-Driven Projection Design
You’re asked: "Which actors are most influential in Hollywood based on their collaboration network?"
Which projection design best answers this question?
-
✓ Monopartite: Actor → Actor (through shared movies) with PageRank or Betweenness
-
❏ Bipartite: Actor → Movie with Node Similarity
-
❏ Bipartite: Actor → Movie with Degree Centrality
-
❏ Monopartite: Movie → Movie with Community Detection
Hint
The question asks about actors and influence. What node type needs to be in your results? What type of algorithms measure influence?
Solution
Monopartite: Actor → Actor (through shared movies) with PageRank or Betweenness.
When designing projections, start with your analytical question:
Question analysis:
-
Who needs to be in the answer? Actors only
-
What are you measuring? Influence
-
What defines the relationship? Collaboration through shared movies
Projection design:
Since only actors need to be in the results, use a monopartite Actor → Actor projection. You don’t need Movie nodes in the final analysis—they’re just the mechanism for creating actor connections.
Algorithm selection:
"Influence" suggests centrality algorithms:
-
PageRank: Measures influence through connections to influential actors
-
Betweenness: Identifies actors who bridge different collaboration groups
Why other options don’t work:
-
Bipartite Actor → Movie projections would give you actor-movie similarity, not actor influence
-
Movie → Movie projections would analyze movies, not actors
-
Community Detection finds groups, not influence rankings
The key skill is: question → node types in answer → projection structure → algorithm category.
Summary
Effective GDS analysis starts with projection design. Match your projection structure to algorithm assumptions: monopartite for centrality and community detection, bipartite for similarity, weighted for pathfinding.
Start with your analytical question, then design a projection that lets the algorithm answer it. Transform your data during projection when needed—calculate weights, combine relationship types, or filter nodes.
Validate every projection by checking node counts, relationship counts, and degree distributions before running algorithms.