Lesson 31 · The Reconcile Transport · Quality Internals
The shared apply mechanism
One transport carries every healer's writes — and protects realtime apply from them. ~14 min.
Builds on: L30 · L9 · L7Anchor: background job vs latency-critical foregroundNew: write budget + deferralNew: non-blocking publish
L30 showed the OWNS healer's guards and treated them as the healer's own. They aren't. Those guarantees live one
layer down, in a shared transport every healer rides — so each new healer inherits safety for free, and a single
config flag takes it from shadow to live. But the transport's deepest job isn't the per-edge guard you've seen; it's
protecting the system's primary work — realtime block apply — from the healer's secondary writes.
Your anchor: the background job that mustn't hurt the foreground
Every backend engineer meets this: a cleanup/reconcile job shares a datastore with a latency-critical request path. Run
it carelessly and it floods the write queue or holds a lock, and suddenly the foreground stalls. Here the foreground is
realtime block apply (L9's single writer); the background is healing. The transport exists to let the second never starve
the first. This is a systems lesson with a DeFi-indexer body.
1 · Where it sits
The transport is the apply half of healing. A per-class op builder renders the verifier's diffs into ops;
the transport publishes them safely. The HealerHook (hook.go) wires the two together:
That split is the whole design: what to reconcile is class-specific (OWNS vs ADMIN_CTRL render different Cypher);
how to apply it safely is universal, so it's written once.
2 · Three guarantees the transport owns
#
Guarantee
How
1
Temporal race guard
every statement carries WHERE coalesce(r.updated_block,0) <= $cursor_read behind a WITH barrier, bound to the block the healer read. A newer event-written edge survives — the guarded write matches 0 rows.
2
Single-writer apply
never touches graphstore directly; every mutation is a graphwrite.Request{Source:"chainref_reconcile"} on the graph-writes stream, applied by the lone writer (L9) — same MVCC-conflict-free path as the indexer.
3
Non-blocking publish
every publish is bounded by a deadline; a blocked or erroring publish is counted and retried next cycle, never allowed to wedge the cycle.
Guards 1 and 2 you met per-healer in L30 — here they're centralized. Guarantee 3 is the new, deep one.
3 · Why non-blocking is the crux
Recall L9: the graph-writer's cursor — the thing that lets realtime apply make progress — advances only on
Source=="indexer" entries. Now picture a reconcile write that blocks on the stream:
// the graph-writes Client defaults to BLOCK-FOREVER backpressure (L7).// if reconcile inherited that, a wedged publish would sit in the stream…// …and realtime block apply behind it would STALL. (A real incident class: 2026-06-01.)
So the transport refuses to inherit block-forever
Each publish is bounded by defaultPublishTimeout = 2s. A publish that doesn't complete in time is treated as
Dropped — counted on the metrics and re-presented next cycle — rather than waited on. Healing is allowed to be
late (it retries forever, idempotently); it is never allowed to be blocking. The primary path's liveness
always wins over the secondary path's completeness.
4 · The write budget — drip, don't flood
The second protection for the foreground is volume. A healer that suddenly finds thousands of gaps — say after a graph
reseed — could dump them all onto the single-writer stream in one cycle and starve realtime apply. So the transport caps
writes per cycle (defaultWriteBudget = 500) and defers the rest:
Published
within budget + completed inside the deadline — the write landed on the stream.
Deferred
over the per-cycle budget — counted and re-presented next cycle. Drip, don't flood.
Dropped
publish errored or timed out — counted and retried next cycle. The non-blocking outcome.
A backlog drains over many cycles at a safe rate, never in one starving burst. Because every op is idempotent
(MERGE/guarded-DELETE) and the temporal guard makes re-application safe, a Deferred or Dropped op simply tries again — no
op is ever lost, only paced.
5 · Shadow mode is a transport flag, not a healer feature
Now L30's shadow mode clicks into place: it's WithShadowMode() on the transport. In shadow the transport
renders every op (exercising the real merge-keys and Cypher, incrementing per-kind counters) but never calls
Publish — every op is reported Deferred, zero requests reach the stream:
tr := reconcile.New(pub, cfg, reconcile.WithShadowMode()) // renders + counts, publishes nothing// going live is ONE change — swap the option for a budget:
tr := reconcile.New(pub, cfg, reconcile.WithWriteBudget(500)) // the builder + hook are untouched
A shadow transport that publishes even once is a bug
The docstring calls a shadow publish a "provenance-corruption escape," pinned by
TestShadowMode_PublishesNothing. Two healers ship behind it today — OWNS (L30) and the higher-risk
ADMIN_CTRL healer (heal_admin_ctrl.go), which reconciles Ownable/EIP-1967 admin edges and keys
every op on a RelKey flavour ({role:'owner'|'proxy_admin'}) so it can never touch a sibling
governance/role-hash ADMIN_CTRL edge. Provenance isolation is its central correctness property.
6 · One subtlety: the WITH barrier does double duty
The temporal guard's WITH barrier has a second effect worth knowing. In pkg/graphwrite's
recorder, a statement containing WITH is non-coalescable — so a guarded reconcile write publishes as a
singleton and is never UNWIND-batched with other writes (L9's coalescing). The same clause that enforces
newer-block-wins also keeps each guarded mutation independently applied. One idiom, two guarantees.
The reusable safety layer
This is the engineering payoff of the whole quality strand: the dangerous part of self-healing — writing to production —
is solved once, in a transport with a budget, a deadline, a temporal guard, single-writer routing, and a shadow
switch. Every healer is then just a BuildOps that renders the right Cypher. Safety is infrastructure, not
per-feature discipline.
Check yourself
1. How do the op builder and the transport divide responsibility?
2. The graph-writer's cursor advances only on Source=="indexer" entries. Why does that make non-blocking publish essential?
3. A publish doesn't complete within the transport's deadline. What's the outcome?
4. Why does the transport cap writes at a per-cycle budget and defer the rest?
5. A Deferred or Dropped op is re-presented next cycle. Why is re-applying it safe?
6. In shadow mode, what does the transport do with each rendered op?
7. What is the single change required to take a shadow healer live?
8. The temporal guard's WITH barrier has a second effect in the graphwrite recorder. What is it?
↳ Ask your teacher
Try: "Show me Transport.Cycle's budget-then-publish loop." ·
"How does the idem-key fold in ChainID, and what window does it dedup?" ·
"What makes a statement coalescable in the recorder (L9)?" ·
"How does ADMIN_CTRL's RelKey flavour-isolation render in Cypher?" ·
"What was the 2026-06-01 stall, concretely?"
What you can now do
Explain the builder/transport split: class-specific BuildOps vs universal safe-apply, wired by HealerHook.
Recite the transport's three guarantees: temporal race guard, single-writer apply, non-blocking publish.
Explain why non-blocking is essential (the indexer-only cursor) and how the 2s deadline + Dropped/retry enforces it.
Explain the write budget + Deferred/Dropped/Published outcomes, and why deferral is safe (idempotent ops).
Describe shadow mode as a transport flag, the one-option path to live, and the WITH-barrier's non-coalescable double duty.
Quality subsystem, complete enough to reason about
Harness (L29) measures drift, healers (L30) decide the fix, the transport (L31) applies it without ever endangering
realtime apply. You can now trace a discrepancy from a chain re-read all the way to a budgeted, deadline-bounded,
single-writer, shadow-gated write — and explain every guard between.