Lesson 31 · The Reconcile Transport · Quality Internals

The shared apply mechanism

One transport carries every healer's writes — and protects realtime apply from them. ~14 min.

Builds on: L30 · L9 · L7 Anchor: background job vs latency-critical foreground New: write budget + deferral New: non-blocking publish

L30 showed the OWNS healer's guards and treated them as the healer's own. They aren't. Those guarantees live one layer down, in a shared transport every healer rides — so each new healer inherits safety for free, and a single config flag takes it from shadow to live. But the transport's deepest job isn't the per-edge guard you've seen; it's protecting the system's primary work — realtime block apply — from the healer's secondary writes.

Your anchor: the background job that mustn't hurt the foreground
Every backend engineer meets this: a cleanup/reconcile job shares a datastore with a latency-critical request path. Run it carelessly and it floods the write queue or holds a lock, and suddenly the foreground stalls. Here the foreground is realtime block apply (L9's single writer); the background is healing. The transport exists to let the second never starve the first. This is a systems lesson with a DeFi-indexer body.

1 · Where it sits

The transport is the apply half of healing. A per-class op builder renders the verifier's diffs into ops; the transport publishes them safely. The HealerHook (hook.go) wires the two together:

verifier diffsgaps / excess / drift (L29)
BuildOpsper-class WHAT to write
Transport.Cycleshared HOW to write safely
single-writer streamgraphwrite → graph-writer (L9)

That split is the whole design: what to reconcile is class-specific (OWNS vs ADMIN_CTRL render different Cypher); how to apply it safely is universal, so it's written once.

2 · Three guarantees the transport owns

#GuaranteeHow
1Temporal race guardevery statement carries WHERE coalesce(r.updated_block,0) <= $cursor_read behind a WITH barrier, bound to the block the healer read. A newer event-written edge survives — the guarded write matches 0 rows.
2Single-writer applynever touches graphstore directly; every mutation is a graphwrite.Request{Source:"chainref_reconcile"} on the graph-writes stream, applied by the lone writer (L9) — same MVCC-conflict-free path as the indexer.
3Non-blocking publishevery publish is bounded by a deadline; a blocked or erroring publish is counted and retried next cycle, never allowed to wedge the cycle.

Guards 1 and 2 you met per-healer in L30 — here they're centralized. Guarantee 3 is the new, deep one.

3 · Why non-blocking is the crux

Recall L9: the graph-writer's cursor — the thing that lets realtime apply make progress — advances only on Source=="indexer" entries. Now picture a reconcile write that blocks on the stream:

// the graph-writes Client defaults to BLOCK-FOREVER backpressure (L7).
// if reconcile inherited that, a wedged publish would sit in the stream…
// …and realtime block apply behind it would STALL. (A real incident class: 2026-06-01.)
So the transport refuses to inherit block-forever
Each publish is bounded by defaultPublishTimeout = 2s. A publish that doesn't complete in time is treated as Dropped — counted on the metrics and re-presented next cycle — rather than waited on. Healing is allowed to be late (it retries forever, idempotently); it is never allowed to be blocking. The primary path's liveness always wins over the secondary path's completeness.

4 · The write budget — drip, don't flood

The second protection for the foreground is volume. A healer that suddenly finds thousands of gaps — say after a graph reseed — could dump them all onto the single-writer stream in one cycle and starve realtime apply. So the transport caps writes per cycle (defaultWriteBudget = 500) and defers the rest:

Published

within budget + completed inside the deadline — the write landed on the stream.

Deferred

over the per-cycle budget — counted and re-presented next cycle. Drip, don't flood.

Dropped

publish errored or timed out — counted and retried next cycle. The non-blocking outcome.

A backlog drains over many cycles at a safe rate, never in one starving burst. Because every op is idempotent (MERGE/guarded-DELETE) and the temporal guard makes re-application safe, a Deferred or Dropped op simply tries again — no op is ever lost, only paced.

5 · Shadow mode is a transport flag, not a healer feature

Now L30's shadow mode clicks into place: it's WithShadowMode() on the transport. In shadow the transport renders every op (exercising the real merge-keys and Cypher, incrementing per-kind counters) but never calls Publish — every op is reported Deferred, zero requests reach the stream:

tr := reconcile.New(pub, cfg, reconcile.WithShadowMode())   // renders + counts, publishes nothing
// going live is ONE change — swap the option for a budget:
tr := reconcile.New(pub, cfg, reconcile.WithWriteBudget(500)) // the builder + hook are untouched
A shadow transport that publishes even once is a bug
The docstring calls a shadow publish a "provenance-corruption escape," pinned by TestShadowMode_PublishesNothing. Two healers ship behind it today — OWNS (L30) and the higher-risk ADMIN_CTRL healer (heal_admin_ctrl.go), which reconciles Ownable/EIP-1967 admin edges and keys every op on a RelKey flavour ({role:'owner'|'proxy_admin'}) so it can never touch a sibling governance/role-hash ADMIN_CTRL edge. Provenance isolation is its central correctness property.

6 · One subtlety: the WITH barrier does double duty

The temporal guard's WITH barrier has a second effect worth knowing. In pkg/graphwrite's recorder, a statement containing WITH is non-coalescable — so a guarded reconcile write publishes as a singleton and is never UNWIND-batched with other writes (L9's coalescing). The same clause that enforces newer-block-wins also keeps each guarded mutation independently applied. One idiom, two guarantees.

The reusable safety layer
This is the engineering payoff of the whole quality strand: the dangerous part of self-healing — writing to production — is solved once, in a transport with a budget, a deadline, a temporal guard, single-writer routing, and a shadow switch. Every healer is then just a BuildOps that renders the right Cypher. Safety is infrastructure, not per-feature discipline.

Check yourself

1. How do the op builder and the transport divide responsibility?
2. The graph-writer's cursor advances only on Source=="indexer" entries. Why does that make non-blocking publish essential?
3. A publish doesn't complete within the transport's deadline. What's the outcome?
4. Why does the transport cap writes at a per-cycle budget and defer the rest?
5. A Deferred or Dropped op is re-presented next cycle. Why is re-applying it safe?
6. In shadow mode, what does the transport do with each rendered op?
7. What is the single change required to take a shadow healer live?
8. The temporal guard's WITH barrier has a second effect in the graphwrite recorder. What is it?
↳ Ask your teacher
Try: "Show me Transport.Cycle's budget-then-publish loop." · "How does the idem-key fold in ChainID, and what window does it dedup?" · "What makes a statement coalescable in the recorder (L9)?" · "How does ADMIN_CTRL's RelKey flavour-isolation render in Cypher?" · "What was the 2026-06-01 stall, concretely?"

What you can now do

Quality subsystem, complete enough to reason about
Harness (L29) measures drift, healers (L30) decide the fix, the transport (L31) applies it without ever endangering realtime apply. You can now trace a discrepancy from a chain re-read all the way to a budgeted, deadline-bounded, single-writer, shadow-gated write — and explain every guard between.

Grounded in: pkg/reconcile/transport.go (three guarantees: temporal WHERE coalesce(r.updated_block,0) <= $cursor_read behind WITH; single-writer graphwrite.Request{Source:"chainref_reconcile"}; non-blocking defaultPublishTimeout=2s → Dropped; defaultWriteBudget=500 → Deferred; WithShadowMode/WithWriteBudget/WithPublishTimeoutMillis options; ChainID-folded IdemKey), hook.go (opBuilder.BuildOps + HealerHook), heal_admin_ctrl.go (AdminCtrlHealer RelKey flavour-isolation, shadow). Verify against source — the code is the truth.