Graph Projections and Structure

Introduction

The way you project your graph determines what algorithms can "see" and analyze.

Flowchart showing graph projection process affecting algorithm analysis.

In this session, you’ll learn how to create projections, understand graph structures, and why this matters for your algorithm results.

What You’ll Learn

By the end of this lesson, you’ll be able to:

  • Create Cypher projections using gds.graph.project()

  • Distinguish between graph structure and node labels in GDS

  • Identify monopartite, bipartite, multipartite, and heterogeneous graph structures

  • Choose appropriate projection strategies based on your target algorithm

Running a Cypher Projection

The most basic Cypher projection command looks like this:

cypher
Basic Cypher projection example
MATCH (source:Actor)-[r:ACTED_IN]->(target:Movie) // (1)
WITH gds.graph.project( // (2)
  'actors-graph', // (3)
  source, // (4)
  target // (5)
) AS g // (6)
RETURN g.graphName AS graph,
      g.nodeCount AS nodes,
      g.relationshipCount AS rels // (7)
  1. Match Actor nodes connected to Movie nodes via ACTED_IN relationships

  2. Create a graph projection using the matched nodes

  3. Name the projection actors-graph

  4. Use the source nodes (Actors) from the MATCH statement

  5. Use the target nodes (Movies) from the MATCH statement

  6. Assign the resulting graph to variable g

  7. Return the graph name, node count, and relationship count

Projecting Graph Models

You are not limited to using the relationships available in the main graph. For example, you can use intermediate nodes in your MATCH statement to create new relationships that exist only in the projection.

cypher
Actor to actor collaboration
MATCH (source:Actor)-[r:ACTED_IN]-> // (1)
        (:Movie)
      <-[:ACTED_IN]-(target:Actor) // (2)
WITH gds.graph.project( // (3)
  'actors-graph', // (4)
  source, // (5)
  target // (6)
) AS g // (7)
RETURN g.graphName AS graph, // (8)
      g.nodeCount AS nodes,
      g.relationshipCount AS rels
  1. Start with Actors who acted in movies

  2. Find other Actors who acted in the same movies

  3. Create a graph projection from the matched pattern

  4. Name the projection actors-graph

  5. Use the first set of Actors as source nodes

  6. Use the second set of Actors as target nodes

  7. Assign the resulting graph to variable g

  8. Return the metadata and graph statistics

Actor to Actor Collaboration Graph

Running the previous projection will create a graph connecting actors directly to actors who worked on the same movies.

Run the following basic graph projection to see how this works for real.

cypher
Basic Cypher projection example
MATCH (source:Actor)-[r:ACTED_IN]->(target:Movie)
WITH gds.graph.project(
  'actors-graph',
  source,
  target
) AS g
RETURN g.graphName AS graph,
      g.nodeCount AS nodes,
      g.relationshipCount AS rels

What You Projected

Let’s focus on the first projection you ran:

cypher
Basic Cypher projection example
MATCH (source:Actor)-[r:ACTED_IN]->(target:Movie)
WITH gds.graph.project(
  'actors-graph',
  source,
  target
) AS g
RETURN g.graphName AS graph,
      g.nodeCount AS nodes,
      g.relationshipCount AS rels

What You Expected

You may have expected to project a graph that looks like this:

mermaid
Bipartite graph
graph LR
    Actor(("Actor"))
    Movie(("Movie"))
    Actor -- "ACTED_IN" --> Movie

This is a bipartite graph—a graph whose nodes fall into two distinct, non-overlapping sets.

What GDS Actually Sees

By default, GDS strips away labels but preserves structure:

mermaid
Bipartite graph
graph LR
    Actor(("Node"))
    Movie(("Node"))
    Actor -- `__ALL__` --> Movie

The graph is still structurally bipartite—Actors still only connect to Movies, never to other Actors. But GDS no longer knows which nodes are Actors and which are Movies.

Structure vs Labels

Two separate concepts:

  • Structure: How nodes connect (bipartite, monopartite, etc.)

  • Labels: What GDS knows about node types

Your projection kept the bipartite structure but lost the labels.

Graph Structures: Monopartite

A monopartite graph has nodes that cannot be separated into distinct non-overlapping sets.

Example: A social network where (:Person)-[:FRIENDS_WITH]→(:Person)

A monopartite social network graph.

Any person can be friends with any other person. You cannot divide the nodes into separate groups where connections only occur between groups.

Graph Structures: Bipartite

A bipartite graph has nodes that fall into exactly two non-overlapping sets, where connections only occur between sets.

Example: (:Actor)-[:ACTED_IN]→(:Movie)

Actors in one row connect only to Movies in another row.

Actors connect to Movies. Actors never connect directly to other Actors. Movies never connect directly to other Movies.

Graph Structures: Multipartite

A multipartite graph has three or more non-overlapping node sets, where connections only occur between sets, never within the same set.

Example: (:User)-[:RATES]→(:Movie)-[:IN_GENRE]→(:Genre)

A graph of tripartite structure

In a true multipartite structure, each set is non-overlapping.

Graph Structures: Heterogeneous

A heterogeneous graph has multiple node types and/or relationship types, but nodes within the same type can connect to each other.

The movie graph with Actor

Our full Movies dataset has Actors, Movies, Users, and Genres. Movies can connect to all node types, meaning the connections can overlap.

Why Structure Matters: PageRank Example

PageRank ranks nodes by "importance" based on incoming connections from other important nodes.

Let’s see what happens when we run it on our unlabelled bipartite projection:

cypher
CALL gds.pageRank.stream('actors-graph', {})
YIELD nodeId, score
RETURN gds.util.asNode(nodeId).title, score
ORDER BY score DESC

PageRank on Bipartite Structure

In our Actor → Movie bipartite structure, PageRank flows into Movies but has nowhere to go from there. This is called a spider trap.

In another graph, we may end up with infinite loops — known as a rank sink.

a single movie node with many actor nodes pointing into it.

Modelling, Not Algorithms

It’s important to remember here: we are talking about graph structures, using algorithms for framing.

Do not worry too much about the intricacies of PageRank or any other algorithm for now—that comes later.

For now, try to see how the signal flows from node to node in the graph structures we’re examining.

Rank Sink

In our bipartite graph, Movie nodes become "rank sinks"—accumulating high scores simply because they receive connections, not because they’re meaningfully important.

Diagram illustrating 'rank sinks' in a bipartite graph.

Almost all nodes receive the same score on either side of the structure. The bipartite structure traps the algorithm’s ranking signal.

Solution 1: Project a True Monopartite Graph

Now let’s return to that second projection—the Actor-to-Actor collaboration graph:

cypher
MATCH (source:Actor)-[:ACTED_IN]->(:Movie)<-[:ACTED_IN]-(target:Actor)
WITH gds.graph.project('actors-only', source, target) AS g
RETURN g.graphName, g.nodeCount, g.relationshipCount

This creates direct Actor-to-Actor connections through shared Movies. The Movies become invisible "bridges."

True Monopartite Result

The projected graph is now monopartite. All actors connect to other actors.

Diagram showing Actor-to-Actor collaboration with invisible Movie bridges.

There is no meaningful way of separating the nodes into non-overlapping sets.

PageRank on Monopartite

Now PageRank can flow between nodes of the same type, producing meaningful importance rankings.

Monopartite Structures

Bear in mind, the projection still does not retain node labels.

Diagram showing importance of graph structure over labels.

It is the graph structure, not its labels, that affects the algorithm’s results.

Preserve Labels

For some algorithms, you will want to retain node labels.

Use configuration to preserve labels:

cypher
MATCH (source:Actor)-[r:ACTED_IN]->(target:Movie)
WITH gds.graph.project(
  'actors-movies-labelled',
  source,
  target,
  {
    sourceNodeLabels: labels(source),
    targetNodeLabels: labels(target),
    relationshipType: type(r)
  },
  {}
) AS g
RETURN g.graphName, g.nodeCount, g.relationshipCount

When to Preserve Labels

Preserve labels when:

  • You need to filter algorithms by node type

  • Node type distinctions affect your analysis

Use default (unlabelled) when:

  • You’re projecting a true monopartite or bipartite subgraph

  • The algorithm ignores node labels (most do)

Node Similarity

Node Similarity compares nodes based on shared neighbours.

It works best on graphs that can be separated into distinct sets of nodes, such as bipartite graphs.

Project User-Movie Bipartite Graph

The following projection creates a bipartite graph with preserved labels.

cypher
MATCH (source:User)-[r:RATED]->(target:Movie)
WITH gds.graph.project(
  'user-rated-movie',
  source, target,
  { sourceNodeLabels: labels(source),
    targetNodeLabels: labels(target) },
  {}
) AS g
RETURN g.graphName, g.nodeCount, g.relationshipCount

Node Similarity Result

Node Similarity finds Users who rated similar Movies—creating new User-to-User relationships.

Diagram of Node Similarity on bipartite graph with shared neighbours.

The algorithm respects the bipartite structure: it compares nodes on one side based on their connections to the other side.

Quick Reference: Choosing Your Projection

Algorithm Type Graph Structure Projection Strategy

PageRank, Betweenness

Works best on monopartite

Project single node type (e.g., Actor-to-Actor)

Node Similarity

Designed for bipartite

Preserve labels, include both types

Community Detection

Varies by algorithm

Check documentation for each

Common Terminology

Term Meaning

Monopartite

Nodes cannot be separated into distinct non-overlapping sets

Bipartite

Exactly two non-overlapping node sets; connections only between sets

Multipartite

Three or more non-overlapping node sets

Heterogeneous

Multiple node types and/or relationship types (may overlap)

Unlabelled

GDS doesn’t know node/relationship types (default behaviour)

Lesson Summary

In this lesson, you learned:

  • How to create Cypher projections with gds.graph.project()

  • How to transform graph structures by changing your MATCH pattern

  • Structure (how nodes connect) is separate from labels (what GDS knows about types)

  • GDS strips labels by default but preserves structure

  • Bipartite structures can trap algorithms like PageRank

  • Project true monopartite graphs for algorithms that expect them

  • Preserve labels when using bipartite-aware algorithms like Node Similarity

In the next lesson, you’ll practice projecting different graph types.

Chatbot

How can I help you today?