Knowledge Check

Test your understanding of the three shapes and the boundary crossing.

The Context Problem

Sam’s vector search returned passages that were semantically similar to the technician’s question, but the copilot still gave unreliable answers. What is the core reason?

❏ A. The embedding model was too small
✓ B. The answer needs a connected set of documents and records, not a pile of similar paragraphs
❏ C. Vector search cannot index PDF files
❏ D. The PDFs were not chunked at the right size

Hint

The workshop calls this "right meaning, wrong shape" - what does the answer to Dani’s question actually look like?

Solution

B is correct: Dani’s question needs a connected set - the bulletin, the part it names, the work orders that used that part, and the vehicles they were on. Vector search returns disconnected passages ranked by similarity; no amount of embedding quality changes the shape of what it returns.

Why others are wrong:

A: A better model returns better-ranked paragraphs - still paragraphs
C: Vector search indexes parsed PDF text without difficulty
D: Chunking affects retrieval quality, not the fundamental shape mismatch

Recall Module 1: The three-layer wall - vector search is layer one.

The Tree Shape

What does the (Library)-[:HAS*]→(Section) containment tree give an agent that keyword or vector search cannot?

❏ A. Faster text matching
❏ B. Automatic summarization of each document
✓ C. The ability to navigate document structure - tables of contents, chapters, and what a document covers
❏ D. Smaller storage requirements

Hint

Think about the first query you ran against the Falcon manual in Module 2.

Solution

C is correct: The variable-length pattern [:HAS*] walks the library’s structure, producing views like a table of contents or "every section under the Engine chapter". Search tools have no concept of "contains" - they can only rank fragments.

Why others are wrong:

A: The tree is about structure, not matching speed
B: Summarization is an LLM task; the tree tells the LLM what to read
D: Storing structure adds nodes and relationships - it does not reduce storage

Recall Module 2: The table-of-contents query over the Falcon manual.

Meaningful Themes

Why did Leiden’s document communities correspond to real repair themes instead of arbitrary clusters?

❏ A. Leiden is guaranteed to find meaningful clusters in any graph
❏ B. The gamma parameter forces communities to match business topics
✓ C. Every edge in the projection exists because two documents touch the same part or fault code (or cite each other), so dense clusters are documents about the same repair topic
❏ D. The sections were manually tagged with topics before the workshop

Hint

The algorithm only sees nodes and links. What made each link exist in the first place?

Solution

C is correct: Community detection finds dense clusters - the meaning comes from the edges. Because the projection’s edges are shared parts, shared codes (glue nodes), and citations, a dense cluster can only mean "documents that keep talking about the same parts and faults" - a real repair theme.

Why others are wrong:

A: On arbitrary links (or random similarity), the clusters would be arbitrary too
B: Gamma controls granularity (how many themes), not meaning
D: Nothing was tagged - surfacing unnamed patterns was the point

Recall Module 3: "Why these clusters mean something."

The Boundary Crossing

The finale answers a question that spans the documents and the warehouse. How does the agent cross that boundary?

❏ A. It merges the warehouse rows into Neo4j and traverses one graph
✓ B. It grounds candidates in the Neo4j document graph, then reads the live warehouse rows from BigQuery on the shared key, and joins the two
❏ C. It hands the whole question to Text2SQL to write one query over everything
❏ D. Vector search retrieves both the documents and the warehouse rows

Hint

Where do the warehouse rows live in this workshop - and did they ever move?

Solution

B is correct: the warehouse rows stay in BigQuery - they fail the four-pains test for migration. The agent grounds in the document graph (which documents cover the code, which parts they name), reads the real outcomes from BigQuery on the shared part number, and joins the two in Python. The connections graph from Module 2 hands it the correct join paths.

Why others are wrong:

A: that was the original instinct; copying the rows fails sync, performance, modeling, and security
C: Text2SQL alone is layer two of the wall - quietly wrong on the multi-hop join chain
D: vector search returns passages, not a connected, computed answer

Recall Module 5: "Federate on the shared key."

Why the Money Query Needed a Graph

The final query ranked candidate fixes by real repair outcomes on similar vehicles. Why was Text2SQL the wrong tool for this question?

❏ A. SQL cannot express multi-table joins
❏ B. The data was too large for a SQL engine
✓ C. The question implies a long chain of joins across both halves, where generated SQL tends to fail silently - plausible but subtly wrong
❏ D. Text2SQL cannot read part numbers

Hint

Layer two of Sam’s wall: what happens to generated SQL as the join chain grows - and how would you know?

Solution

C is correct: The question spans documents, sections, references, parts, work orders, and vehicles - five or more joins, half of them against tables derived from PDFs. Text2SQL is nondeterministic on chains like this and fails silently: the query runs and returns plausible rows that are subtly wrong. The connections graph hands the agent the exact join paths, so the SQL it writes against the warehouse is grounded, not guessed.

Why others are wrong:

A: SQL can express the joins - the problem is reliably generating the right ones from natural language; the connections graph is what makes them reliable
B: This dataset is tiny; scale was never the issue
D: Part numbers are ordinary strings to any tool

Recall Modules 1 and 5: "Text2SQL is quietly wrong" and the federated finale.

Summary

Congratulations on completing AI on Your Lakehouse: Context Comes in Shapes, Not Queries!

You’ve successfully:

Built a navigable document tree from parsed PDFs with shared-key links
Surfaced repair themes with Leiden community detection
Merged a Delta warehouse into the same graph on shared keys
Written multi-hop Cypher that crosses the document-to-table boundary
Seen how to port the pattern to your own lakehouse and agents

Continue learning:

Neo4j & GenAI Fundamentals - retrievers and GraphRAG
Community Detection - deeper into Leiden and Louvain
Context Graphs: Agent Memory with Neo4j - persistent, explainable agent memory

AI on Your Lakehouse: Context Comes in Shapes, Not Queries

The Context Problem

Connections - the structured shape

Navigate What’s There - Trees

Surface Themes - Communities

Put It Together - the federated finale

Port the Pattern

Knowledge Check

The Context Problem

The Tree Shape

Meaningful Themes

The Boundary Crossing

Why the Money Query Needed a Graph

Summary

Chatbot

Data Model