Challenge
The themes shape surfaces what nobody named.
Read its spec first - docs/theme-format.md - then build skill/scripts/themes.py: the pipeline and renderer are given; the projection query is marked BUILD FROM SPEC.
This completes Building Block 2: "Themes nobody named exist as data" ✓
The reasoning before the query
Documents rarely link each other directly.
But the Falcon manual and bulletin TSB-21-114 both reference coil IC-2042-A - they are about the same thing.
The spec’s key move: parts and codes are glue nodes.
Project a document-level graph of documents plus glue, and two documents that co-cite the same part sit two hops apart - close enough to cluster, even with no link between them.
Collapsing section-level edges to their owning documents needs no tree walk - the URIs already encode ownership:
MATCH (d:Document {uri: split(s.uri, '#')[0]})Hand the projection spec to your agent and review the Cypher: sections' REFERENCES_PART, REFERENCES_CODE, and LINKS_TO edges, collapsed to document pairs and document-glue pairs, undirected, weighted by mention count.
Run the pipeline
The rest of the script is given and runs the spec’s pipeline: Leiden mutate → per-community conductance (the cohesion words) → write themeId back → renderer queries → drop the projection.
Mutate-before-write matters: the conductance metric reads themeId from the in-memory projection, which never sees database writes.
python skill/scripts/themes.pyTwo broad themes appear - electrical-and-brakes and ignition-and-catalyst. Structurally defensible, but coarser than how technicians think.
Turn the dial
Leiden’s gamma is the granularity dial - higher favors more, finer themes:
python skill/scripts/themes.py --gamma 2.0Four themes now - sensors and hoses, ignition, brakes, charging - and the header still reconciles: every document is grouped or honestly ungrouped.
There is no single correct resolution; it depends on the questions you will ask.
Note the spec’s stability contract: theme numbers are assigned per run - store URIs, never T<id>.
Stuck or out of sync?
The complete script is in solutions/scripts/themes.py.
Validate the Themes
Once themes.py has written themeIds back, click the Check Database button to verify.
Hint
Only the write-back step changes the database - the projection and Leiden run in memory.
Check:
-
The projection query you built returns rows (test it in the sandbox first)
-
python skill/scripts/themes.pyprinted the THEMES header with grouped documents
Solution
Run the complete tool:
python solutions/scripts/themes.pyThen check the assignments:
MATCH (d:Document) WHERE d.themeId IS NOT NULL
RETURN d.themeId AS theme, collect(d.id) AS documentsMost documents should carry a theme, in at least two groups.
If verification fails:
-
If the projection already exists from a crashed run, the script drops it automatically - re-run it
-
Confirm Module 2’s verification passes first (the references and links must exist)
Summary
You built the themes shape:
-
Glue-node projection - documents + parts/codes, co-citation as clustering signal, ownership by URI prefix
-
Leiden mutate → conductance → write - granularity on the
gammadial, cohesion as words not scores -
Building Block 2: "Themes nobody named exist as data" ✓
In the next lesson, you will read the theme blocks the way an agent does - and name the themes yourself.