Fraud Detection

Introduction

Traditional fraud detection looks at individual transactions. But organized fraud operates as coordinated networks—multiple actors working together.

Graph algorithms excel at revealing these hidden groups.

What You’ll Learn

By the end of this lesson, you’ll be able to:

Identify fraud patterns that are invisible in tables but obvious in graphs
Explore a real fraud dataset and discover suspicious connections
Design a graph projection strategy for fraud investigation
Choose appropriate algorithms for narrowing search space and ranking suspects

Why Graphs?

Fraudsters actively try to hide. They use:

Multiple accounts
Shared devices and credit cards
Complex transaction chains

Left: a User node connected to multiple Credit Card nodes; Middle: IP and Credit Card nodes connected to many users; Right: A complex transaction chain across multiple fraudulent users.

These connections are invisible in tables—but obvious in graphs. A relational database would require multiple JOINs to discover what a single graph traversal reveals instantly.

The Dataset

Let’s see these principles in action with a real fraud network.

You’ll work with an anonymized peer-to-peer (P2P) financial transactions dataset containing:

User nodes (some flagged as known fraudsters)
Card, Device, and IP nodes
P2P, HAS_CC, HAS_IP, USED relationships

mermaid

Schema diagram showing User, Card, Device, and IP nodes with relationships.

graph LR
    User1(("User"))
    User2(("User"))
    Card(("Card"))
    Device(("Device"))
    IP(("IP"))
    User1 -- "P2P" --> User2
    User1 -- "HAS_CC" --> Card
    User1 -- "HAS_IP" --> IP
    User1 -- "USED" --> Device

Explore the Schema

Run this to see the data model:

cypher

CALL db.schema.visualization()

Take a moment to understand the structure. Users connect to Cards, Devices, and IPs. Users also connect to other Users via P2P transactions.

Node and Relationship Counts

Let’s see the scale of our data:

cypher

MATCH (n)
WHERE n:UserP2P OR n:IP OR n:Card OR n:Device
WITH count(n) AS nodeCount // (1)
MATCH (n)-[r]->() // (2)
WHERE n:UserP2P OR n:IP OR n:Card OR n:Device
RETURN nodeCount AS nodes, count(r) AS relationships

Count all nodes first, then pass the total forward
Separate MATCH to count relationships independently

This should return approximately 790,000 nodes and 1.8 million relationships.

This is a large search space—far too big for manual investigation. We need algorithms to help us focus.

Note here that we would usually run a simpler query to count the nodes and relationships. However, in this demo graph we actually have two graphs, so we need to filter the counts.

Fraud Flags

Known fraudulent users have been flagged with the property fraudMoneyTransfer = 1.

cypher

MATCH (u:UserP2P)
WHERE u.fraudMoneyTransfer = 1 // (1)
RETURN u
LIMIT 10

The fraudMoneyTransfer property is a pre-labeled ground truth—only a subset of users carry this flag

Explore Flagged Users

Run the query above and expand the nodes you find.

What to notice:

How many unflagged users connect to each flagged user?
What types of nodes (Cards, Devices, IPs) appear in the neighborhood?
Do you see any shared infrastructure between flagged users?

The fraud property is only applied to some users—many connected users remain unflagged. These connections are exactly what we want to investigate.

Transfer Chains

Now let’s see how flagged users connect to each other through transaction chains:

cypher

MATCH path = (u1:UserP2P)-[:P2P*5]-(u2:UserP2P) // (1)
WHERE u1.fraudMoneyTransfer = 1
  AND u2.fraudMoneyTransfer = 1
  AND u1 <> u2 // (2)
RETURN path
LIMIT 100

[:P2P*5] follows exactly 5 hops of P2P transactions—long enough to reveal intermediaries between fraudsters
Ensures the two endpoints are distinct users, avoiding self-loops in the result

What the Chains Reveal

What to notice:

Users in the middle of chains often aren’t flagged
Flagged users at either end suggest the middle users may be involved
Money flows through these intermediaries—intentionally or not

Discovering this pattern in a relational database would require five self-JOINs on the transaction table. In a graph, it’s a simple path query.

This is the power of graph-based fraud detection: patterns that are computationally expensive in tables become trivial traversals.

Shared Infrastructure

Two known fraudsters sharing the same credit card is suspicious:

cypher

MATCH path = (u1:UserP2P)-[:HAS_CC|USED]->(shared) // (1)
             <-[:HAS_CC|USED]-(u2:UserP2P)
WHERE u1.fraudMoneyTransfer = 1
  AND u2.fraudMoneyTransfer = 1
  AND u1 <> u2
RETURN path
LIMIT 50

The (shared) node is an unnamed Card or Device—the pattern matches two fraudsters converging on the same piece of infrastructure via HAS_CC or USED

Legitimate users rarely share credit cards or devices. When two flagged users share infrastructure, it suggests coordination—possibly the same person operating multiple accounts.

Fraud patterns

Finding connections between known fraudsters isn’t remarkable.

Can we use these patterns to find fraudsters we don’t know about yet?

That’s what this module will teach you.

The Challenge

With ~790,000 nodes and ~1.8 million relationships, manual investigation is impossible.

The Strategy

We need algorithms to:

Narrow the search space — Find suspicious communities
Rank suspects — Prioritize who to investigate first

Designing the Projection

Before running algorithms, we need to decide what to project.

This network is heterogeneous—-it contains multiple overlapping node and relationship types:

Users, Cards, Devices, IPs
P2P transactions, HAS_CC, HAS_IP, USED relationships

Projection Options

We could project this network in different ways:

Approach	What It Captures	What It Loses
Monopartite (UserP2P → UserP2P)	Direct transactions	Shared infrastructure
Bipartite (UserP2P → Card/Device)	Shared infrastructure	Direct transactions
Heterogeneous (All nodes)	Everything	Nothing (but more complex)

Approach

What It Captures

What It Loses

Monopartite (UserP2P → UserP2P)

Direct transactions

Shared infrastructure

Bipartite (UserP2P → Card/Device)

Shared infrastructure

Direct transactions

Heterogeneous (All nodes)

Everything

Nothing (but more complex)

Our Choice: Heterogeneous

For fraud detection, shared infrastructure is critical evidence.

We’ll project Users, Cards, and Devices together.

This allows our algorithms to find communities based on several connection types—not just P2P transactions.

We will exclude IP nodes to avoid noise.

Diagram showing heterogeneous projection of Users

In later lessons, you’ll see how to refine projections for specific investigative questions. For initial exploration, capturing everything gives us the broadest view.

The Algorithmic Strategy

We’ll use two algorithm families in sequence:

Step	Algorithm Family	Purpose
1	Community Detection	Find groups containing known fraudsters
2	Centrality	Rank users within those groups

Step

Algorithm Family

Purpose

Community Detection

Find groups containing known fraudsters

Centrality

Rank users within those groups

Community detection reduces the search space. Centrality ranks the suspects.

Applying the Framework

Let’s formalize our approach:

Question: Who else is involved in fraud networks?

Projection: Users, Cards, and Devices with their relationships

Algorithm: Community detection, then centrality ranking

Config: Start with defaults; refine based on results

What’s Next

In the following lessons, you’ll:

Lesson 2: Learn how Louvain community detection works
Lesson 3: Run Louvain to reduce your search space by 98%
Lesson 4: Learn Degree Centrality and WCC for formal community assignment
Lesson 5: Build fraud communities using entity resolution patterns

Each lesson builds on the previous, taking you from raw data to actionable suspect lists.

Summary

Fraud detection shifts from individual transactions to network analysis:

Graph structure reveals connections fraudsters try to hide
Community detection identifies fraud rings
Centrality ranks suspects within those rings

You’ve explored the dataset and seen how flagged users connect through transactions and shared infrastructure. Now you’re ready to apply algorithms to find the fraudsters hiding in plain sight.

Graph Data Science in Practice

GDS Foundations

Community Detection for Fraud

Fraud Detection

Introduction

What You’ll Learn

Why Graphs?

The Dataset

Explore the Schema

Node and Relationship Counts

Fraud Flags

Explore Flagged Users

Transfer Chains

What the Chains Reveal

Shared Infrastructure

Fraud patterns

The Challenge

The Strategy

Designing the Projection

Projection Options

Our Choice: Heterogeneous

The Algorithmic Strategy

Applying the Framework

What’s Next

Summary

Chatbot

Data Model