Patterns Nobody Named

Introduction

Your library now has links - citations and shared-key edges - and references from sections to parts and codes. Step back and look at the whole web: the connections are not evenly spread - they clump.

In this lesson, you will learn what community detection does and why running it over shared-key links produces meaningful themes.

Communities are densely linked clusters

A community is a group of nodes more densely connected to each other than to the rest of the graph.

In the AutoFix library, the Falcon manual, the misfire bulletin, and the coil recall all keep touching the same parts and codes - IC-2042-A, P0301. The brake documents cluster around different parts. Almost nothing connects an ignition document to a brake document. The parts and codes act as glue nodes: two documents that both reference the same part are connected through it, even if neither cites the other - co-citation, the strongest theme signal in any real corpus.

Community detection algorithms find those clusters automatically. You will use Leiden, a Graph Data Science algorithm that assigns each node a community ID so that links inside communities are dense and links between them are sparse.

Why these clusters mean something

Community detection on arbitrary data produces arbitrary clusters. Yours will not, because of what each link is:

  • Every edge exists because two documents touch the same physical part or the same fault code - or one explicitly cites the other

  • A dense cluster of documents is therefore a set of documents about the same repair topic

  • The cluster spanning the Falcon manual, bulletin TSB-21-114, and recall RC-2021-04 is not a coincidence - it is the ignition misfire theme, assembled from documents that mostly never cite each other

Nobody at AutoFix maintains a list of repair themes. The themes exist anyway - as structure in the data. That is the second shape: patterns nobody named.

Why an agent needs themes

Without themes, an agent answering "where are we seeing repeat problems?" has to read everything - and blows past its context window.

With themes, the agent gets a small, structured view: a handful of evidence blocks, each with its member documents and the parts and codes that define it. That fits in one prompt, and every claim in it traces back to real edges.

The spec for that view is docs/theme-format.md - read it before the challenge, the same way you read the outline spec.

Summary

In this lesson, you learned the second shape:

  • Community - a group of nodes more densely connected internally than externally

  • Leiden - the Graph Data Science algorithm you will run to assign community IDs

  • Why it works here - shared-key links make clusters correspond to real repair themes, not noise

In the next challenge, you will project the document graph and run Leiden yourself.

Chatbot

How can I help you today?

Data Model

Your data model will appear here.