Federate on the Shared Key

Introduction

Your tools so far read the document half. The agent’s judgment needs the other half: what actually happened in the shop - which is in the BigQuery warehouse, and which is staying there.

In this lesson, you will learn the move that crosses the boundary without moving the data: federate on the shared key.

The key is in both halves

A part number like IC-2042-B appears in a bulletin section (the document graph, in Neo4j) and on hundreds of work-order line items (the warehouse, in BigQuery). The same is true of a trouble code. Those shared keys are the link the lakehouse never made.

You could copy the warehouse into Neo4j and traverse one graph - the original instinct. But run it through the four-pains test from Module 2 and the rows fail every one: they churn constantly, there are millions of them, they are already modeled in the warehouse, and they are the sensitive layer. So you do not move them.

Instead, the agent crosses the boundary at query time:

mermaid

graph LR
    Q((work order <br> P0301 on this vin)) --> N[Neo4j: which docs cover P0301, which parts they name + grounding]
    N -->|candidate parts| B[BigQuery SQL: outcomes for those parts on similar vehicles]
    B -->|joined in Python| R((evidence-backed fix))

Why the connections graph matters here

The BigQuery step is a multi-table join - work orders to vehicles to parts. Text2SQL guesses those joins and is quietly wrong on four-plus tables. Your agent does not guess: it reads the schema and foreign keys from the connections MCP you built in Module 2, and writes the SQL correctly every time.

That is the whole thesis closing. The document shapes ground the what; the connections shape grounds the how to join; federation crosses the boundary on the key - no migration, no guessing.

But can a Databricks agent not orchestrate this too?

Yes - a Mosaic AI agent with managed MCP can bind vector search and SQL and cross the boundary. The graph’s edge is not capability, it is trust. The cross-boundary path is one deterministic traversal you can audit, not a tool-chain that re-derives the join on every call and is roughly 70-80% reliable. And the part of the answer that is not a column - which bulletin applies, and why - lives as a traversable edge, modeled once, instead of being re-inferred per query. You win on determinism and auditability, not on "the other tool cannot."

The trail stays a graph

One thing does get written to Neo4j: the agent’s decision. A Recommendation - the part it chose, the recall it bundled, the sections it cited, the order it placed - is a small graph, and it belongs in one: it is the audit trail, and every claim in it points at evidence.

So the split is clean. Documents and the decision trail live in the graph (you own their structure); the warehouse rows stay in BigQuery (it owns them); the agent federates across the key.

Summary

In this lesson, you learned the boundary crossing:

Federate, don’t migrate - the warehouse rows fail the four-pains test, so they stay in BigQuery
Cross on the shared key - Neo4j grounds the candidates, BigQuery returns the facts, Python joins them
The connections graph grounds the SQL - the join paths from Module 2 keep Text2SQL honest
The decision trail is a graph - the Recommendation is auditable, so it lives in Neo4j

In the next challenge, you watch the agent federate its judgment live - retrieving the warehouse schema and writing the Text2SQL itself.

AI on Your Lakehouse: Context Comes in Shapes, Not Queries

The Context Problem

Connections - the structured shape

Navigate What’s There - Trees

Surface Themes - Communities

Put It Together - the federated finale

Port the Pattern