Graph Projections and Structure

Introduction

The way you project your graph determines what algorithms can "see" and analyze.

Flowchart showing graph projection process affecting algorithm analysis.

In this session, you’ll learn how to create projections, understand graph structures, and why this matters for your algorithm results.

What You’ll Learn

By the end of this lesson, you’ll be able to:

Create Cypher projections using gds.graph.project()
Distinguish between graph structure and node labels in GDS
Identify monopartite, bipartite, multipartite, and heterogeneous graph structures
Choose appropriate projection strategies based on your target algorithm

Running a Cypher Projection

The most basic Cypher projection command looks like this:

cypher

Basic Cypher projection example

MATCH (source:Actor)-[r:ACTED_IN]->(target:Movie) // (1)
WITH gds.graph.project( // (2)
  'actors-graph', // (3)
  source, // (4)
  target // (5)
) AS g // (6)
RETURN g.graphName AS graph,
      g.nodeCount AS nodes,
      g.relationshipCount AS rels // (7)

Match Actor nodes connected to Movie nodes via ACTED_IN relationships
Create a graph projection using the matched nodes
Name the projection actors-graph
Use the source nodes (Actors) from the MATCH statement
Use the target nodes (Movies) from the MATCH statement
Assign the resulting graph to variable g
Return the graph name, node count, and relationship count

Projecting Graph Models

You are not limited to using the relationships available in the main graph. For example, you can use intermediate nodes in your MATCH statement to create new relationships that exist only in the projection.

cypher

Actor to actor collaboration

MATCH (source:Actor)-[r:ACTED_IN]-> // (1)
        (:Movie)
      <-[:ACTED_IN]-(target:Actor) // (2)
WITH gds.graph.project( // (3)
  'actors-graph', // (4)
  source, // (5)
  target // (6)
) AS g // (7)
RETURN g.graphName AS graph, // (8)
      g.nodeCount AS nodes,
      g.relationshipCount AS rels

Start with Actors who acted in movies
Find other Actors who acted in the same movies
Create a graph projection from the matched pattern
Name the projection actors-graph
Use the first set of Actors as source nodes
Use the second set of Actors as target nodes
Assign the resulting graph to variable g
Return the metadata and graph statistics

Actor to Actor Collaboration Graph

Running the previous projection will create a graph connecting actors directly to actors who worked on the same movies.

Run the following basic graph projection to see how this works for real.

cypher

Basic Cypher projection example

MATCH (source:Actor)-[r:ACTED_IN]->(target:Movie)
WITH gds.graph.project(
  'actors-graph',
  source,
  target
) AS g
RETURN g.graphName AS graph,
      g.nodeCount AS nodes,
      g.relationshipCount AS rels

What You Projected

Let’s focus on the first projection you ran:

cypher

Basic Cypher projection example

MATCH (source:Actor)-[r:ACTED_IN]->(target:Movie)
WITH gds.graph.project(
  'actors-graph',
  source,
  target
) AS g
RETURN g.graphName AS graph,
      g.nodeCount AS nodes,
      g.relationshipCount AS rels

What You Expected

You may have expected to project a graph that looks like this:

mermaid

Bipartite graph

graph LR
    Actor(("Actor"))
    Movie(("Movie"))
    Actor -- "ACTED_IN" --> Movie

This is a bipartite graph—a graph whose nodes fall into two distinct, non-overlapping sets.

What GDS Actually Sees

By default, GDS strips away labels but preserves structure:

mermaid

Bipartite graph

graph LR
    Actor(("Node"))
    Movie(("Node"))
    Actor -- `__ALL__` --> Movie

The graph is still structurally bipartite—Actors still only connect to Movies, never to other Actors. But GDS no longer knows which nodes are Actors and which are Movies.

Structure vs Labels

Two separate concepts:

Structure: How nodes connect (bipartite, monopartite, etc.)
Labels: What GDS knows about node types

Your projection kept the bipartite structure but lost the labels.

Graph Structures: Monopartite

A monopartite graph has nodes that cannot be separated into distinct non-overlapping sets.

Example: A social network where (:Person)-[:FRIENDS_WITH]→(:Person)

Any person can be friends with any other person. You cannot divide the nodes into separate groups where connections only occur between groups.

Graph Structures: Bipartite

A bipartite graph has nodes that fall into exactly two non-overlapping sets, where connections only occur between sets.

Example: (:Actor)-[:ACTED_IN]→(:Movie)

Actors in one row connect only to Movies in another row.

Actors connect to Movies. Actors never connect directly to other Actors. Movies never connect directly to other Movies.

Graph Structures: Multipartite

A multipartite graph has three or more non-overlapping node sets, where connections only occur between sets, never within the same set.

Example: (:User)-[:RATES]→(:Movie)-[:IN_GENRE]→(:Genre)

In a true multipartite structure, each set is non-overlapping.

Graph Structures: Heterogeneous

A heterogeneous graph has multiple node types and/or relationship types, but nodes within the same type can connect to each other.

Our full Movies dataset has Actors, Movies, Users, and Genres. Movies can connect to all node types, meaning the connections do not overlap between node types. It is both heterogeneous and multipartite.

Why Structure Matters: PageRank Example

PageRank ranks nodes by "importance" based on incoming connections from other important nodes.

Let’s see what happens when we run it on our unlabelled bipartite projection:

cypher

CALL gds.pageRank.stream('actors-graph', {})
YIELD nodeId, score
RETURN gds.util.asNode(nodeId).title, score
ORDER BY score DESC

PageRank on Bipartite Structure

In our Actor → Movie bipartite structure, PageRank flows into Movies but has nowhere to go from there. This is called a spider trap.

In another graph, we may end up with infinite loops — known as a rank sink.

a single movie node with many actor nodes pointing into it.

Modelling, Not Algorithms

It’s important to remember here: we are talking about graph structures, using algorithms for framing.

Do not worry too much about the intricacies of PageRank or any other algorithm for now.

For now, try to see how the signal flows from node to node in the graph structures we’re examining.

Rank Sink

In our bipartite graph, Movie nodes become "rank sinks"—accumulating high scores simply because they receive connections, not because they’re meaningfully important.

Almost all nodes receive the same score on either side of the structure. The bipartite structure traps the algorithm’s ranking signal.

Solution 1: Project a True Monopartite Graph

Now let’s return to that second projection—the Actor-to-Actor collaboration graph:

cypher

MATCH (source:Actor)-[:ACTED_IN]->(:Movie)<-[:ACTED_IN]-(target:Actor)
WITH gds.graph.project('actors-only', source, target) AS g
RETURN g.graphName, g.nodeCount, g.relationshipCount

This creates direct Actor-to-Actor connections through shared Movies. The Movies become invisible "bridges."

True Monopartite Result

The projected graph is now monopartite. All actors connect to other actors.

Diagram showing Actor-to-Actor collaboration with invisible Movie bridges.

There is no meaningful way of separating the nodes into non-overlapping sets.

PageRank on Monopartite

Now PageRank can flow between nodes of the same type, producing meaningful importance rankings.

cypher

CALL gds.pageRank.stream('actors-only', {})
YIELD nodeId, score
RETURN gds.util.asNode(nodeId).name, score
ORDER BY score DESC

Monopartite Structures

Bear in mind, the projection still does not retain node labels.

Diagram showing importance of graph structure over labels.

It is the graph structure, not its labels, that affects the algorithm’s results.

Preserve Labels

For some algorithms, you will want to retain node labels.

Use configuration to preserve labels:

cypher

MATCH (source:Actor)-[r:ACTED_IN]->(target:Movie)
WITH gds.graph.project(
  'actors-movies-labelled',
  source,
  target,
  {
    sourceNodeLabels: labels(source),
    targetNodeLabels: labels(target),
    relationshipType: type(r)
  },
  {}
) AS g
RETURN g.graphName, g.nodeCount, g.relationshipCount

When to Preserve Labels

Preserve labels when:

You need to filter algorithms by node type
Node type distinctions affect your analysis

Use default (unlabelled) when:

You’re projecting a true monopartite or bipartite subgraph
The algorithm ignores node labels (most do)

Node Similarity

Node Similarity compares nodes based on shared neighbours.

It works best on graphs that can be separated into distinct sets of nodes, such as bipartite graphs.

Project User-Movie Bipartite Graph

The following projection creates a bipartite graph with preserved labels.

cypher

MATCH (source:User)-[r:RATED]->(target:Movie)
WITH gds.graph.project(
  'user-rated-movie',
  source, target,
  { sourceNodeLabels: labels(source),
    targetNodeLabels: labels(target) },
  {}
) AS g
RETURN g.graphName, g.nodeCount, g.relationshipCount

This creates a bipartite graph with preserved labels. When you run Node Similarity on this graph, it will compare Users based on their shared Movie connections.

Run Node Similarity

Now we’ll run Node Similarity on the user-rated-movie graph.

cypher

CALL gds.nodeSimilarity.stream('user-rated-movie', {
  topK: 3 // (1)
})
YIELD node1, node2, similarity
RETURN gds.util.asNode(node1).name AS user1, // (2)
       gds.util.asNode(node2).name AS user2, // (3)
       similarity // (4)
ORDER BY similarity DESC // (4)
LIMIT 10 // (5)

Stream Node Similarity on the user-rated-movie projection
Return the top 3 most similar nodes per node
Return the name of the first node
Return the name of the second node
Return the similarity score

Node Similarity Result

Node Similarity finds Users who rated similar Movies—creating new User-to-User relationships.

Diagram of Node Similarity on bipartite graph with shared neighbours.

The algorithm respects the bipartite structure: it compares nodes on one side based on their connections to the other side.

Don’t worry too much about the details of the algorithm for now.

Quick Reference: Choosing Your Projection

Algorithm Type	Graph Structure	Projection Strategy
PageRank, Betweenness	Works best on monopartite	Project single node type (e.g., Actor-to-Actor)
Node Similarity	Designed for bipartite	Preserve labels, include both types
Community Detection	Varies by algorithm	Check documentation for each

Algorithm Type

Graph Structure

Projection Strategy

PageRank, Betweenness

Works best on monopartite

Project single node type (e.g., Actor-to-Actor)

Node Similarity

Designed for bipartite

Preserve labels, include both types

Community Detection

Varies by algorithm

Check documentation for each

Common Terminology

Term	Meaning
Monopartite	Nodes cannot be separated into distinct non-overlapping sets
Bipartite	Exactly two non-overlapping node sets; connections only between sets
Multipartite	Three or more non-overlapping node sets
Heterogeneous	Multiple node types and/or relationship types (may overlap)
Unlabelled	GDS doesn’t know node/relationship types (default behaviour)

Term

Meaning

Monopartite

Nodes cannot be separated into distinct non-overlapping sets

Bipartite

Exactly two non-overlapping node sets; connections only between sets

Multipartite

Three or more non-overlapping node sets

Heterogeneous

Multiple node types and/or relationship types (may overlap)

Unlabelled

GDS doesn’t know node/relationship types (default behaviour)

Lesson Summary

In this lesson, you learned:

How to create Cypher projections with gds.graph.project()
How to transform graph structures by changing your MATCH pattern
Structure (how nodes connect) is separate from labels (what GDS knows about types)
GDS strips labels by default but preserves structure
Bipartite structures can trap algorithms like PageRank
Project true monopartite graphs for algorithms that expect them
Preserve labels when using bipartite-aware algorithms like Node Similarity

In the next lesson, you’ll practice projecting different graph types.

Graph Data Science in Practice

GDS Foundations

Community Detection for Fraud

Graph Projections and Structure

Introduction

What You’ll Learn

Running a Cypher Projection

Projecting Graph Models

Actor to Actor Collaboration Graph

What You Projected

What You Expected

What GDS Actually Sees

Structure vs Labels

Graph Structures: Monopartite

Graph Structures: Bipartite

Graph Structures: Multipartite

Graph Structures: Heterogeneous

Why Structure Matters: PageRank Example

PageRank on Bipartite Structure

Modelling, Not Algorithms

Rank Sink

Solution 1: Project a True Monopartite Graph

True Monopartite Result

PageRank on Monopartite

Monopartite Structures

Preserve Labels

When to Preserve Labels

Node Similarity

Project User-Movie Bipartite Graph

Run Node Similarity

Node Similarity Result

Quick Reference: Choosing Your Projection

Common Terminology

Lesson Summary

Chatbot

Data Model