A periodic, two-tier data-quality monitor — and how it differs from chainref. ~13 min.
Builds on: L29 · L32 · L8Anchor: usd_value arithmetic, Safe signersNew: fast / slow tiersNew: self-consistency vs chain-truth
I pitched this lesson as "validation — the inline cousin of chainref." Reading the code corrects that, and the
correction is the lesson: pkg/validation isn't inline at all. It's a second periodic monitor, running
its own loop, asking a different question than chainref. Two quality systems, two altitudes — and knowing which answers
which is the point.
Two questions, two subsystems
chainref (L29–33) asks "does the graph match the blockchain?" — it re-reads on-chain truth, and it
acts (heals, tickets). validation asks "does the graph obey its own rules and stay internally
coherent?" — mostly self-consistency checks, and it only reports (logs + metrics, no auto-fix). One guards reality;
the other guards coherence. They overlap deliberately at the edges, which we'll see.
1 · A catalog of numbered checks
Validation is a flat registry of small CheckFuncs, each with a stable ID, a category, and a graded severity. A
check returns Finding{Severity, Category, CheckID, Message, Count, Examples}. The catalog reads like a linter for
the graph:
Category
Examples
Asks
Schema (S)
S02 every node has id+graph_id, S03 IDs lowercase-hex, S04 no self-loops, S05 no duplicate edges, S06 no orphaned edges
structural well-formedness
Classification (C)
C01 valid behavior_class, C03 no junk protocol names, C04 multisig has signers + safe_threshold, C06 no type downgrades
R07 focus-token count under a warn threshold, plus cross-source price agreement
derived data is in range
Severity is graded per finding, not binary — a check computes infowarnerror from how bad the result is (e.g. B01 is info when clean, warn
on minor drift, error on a real mismatch). The suite is a dashboard of dozens of these, each a fresh gauge every cycle.
2 · The two-tier cadence — separate by cost
The core design decision: checks are split into two independent tiers by what they touch, each with its own loop
and cadence, so a cheap check never waits on an expensive one.
Fast tier
Neo4j + Redis only. Cadence ~10 min, 2-min per-check timeout. The structural/consistency checks (most of S, C, R).
Two registries, two goroutines (RunFast / RunSlow). The reasoning is exactly L24/L29's RPC-budget tension:
RPC checks are slow and rate-limited, so isolating them means the cheap graph checks give fast feedback every few minutes
instead of being dragged to the hourly RPC cadence. Cost dictates cadence; cadence dictates the loop.
3 · Per-check isolation — a check that breaks is itself a finding
Here's the robustness move. Each check runs in its own goroutine under a per-check timeout, and the runner converts its
own failures into findings:
gofunc() {
deferfunc() { if r := recover(); r != nil { out.panicVal = r } }() // a panic is caught…
out.findings = check(checkCtx, deps)
}()
select {
case out := <-done: // panic → emit a CHECK_PANIC finding (SeverityError), keep goingcase <-checkCtx.Done(): // timeout → emit a CHECK_TIMEOUT finding (SeverityError), keep going
}
The monitor monitors itself
A panicking or hung check doesn't crash the suite or wedge the cycle — it's recovered and emitted as a synthetic
CHECK_PANIC / CHECK_TIMEOUT finding, then the next check runs. This is L8's fail-loop discipline pushed to
per-check granularity, with a twist: the suite's own breakage becomes first-class data on the same dashboard as the
graph's. You can't have a silently-dead check — a dead check reports itself.
4 · Monitor, not control loop
The deepest contrast with chainref: validation has no actuator. Findings flow to telemetry gauges (per check,
per category, per severity) and structured logs — and stop there. No streak, no Linear ticket, no healer. Humans read the
dashboard and decide.
A subtle gauge detail worth stealing
Every check records its count each cycle, including 0 for passing checks. Why emit a zero? So a check that flips
failing → passing doesn't leave a stale non-zero series lingering on the dashboard. A monitor that only reports
problems can't show "the problem went away" — recording the zero is what makes the green state visible. Small habit, big
difference in an observability system.
5 · The deliberate overlap with chainref
Notice B03 here — Σ HOLDS ≤ totalSupply — is the same invariant as L32's
BalanceConservationVerifier. That's not duplication by accident; it's the same fact checked at two altitudes:
validation B03
chainref BalanceConservation (L32)
Role
a coarse data-quality alarm (warn finding on a gauge)
a precise audited drift check with an asymmetric band
Acts?
no — reports only
yes — feeds streak → ticket / heal
Question
"is the graph internally coherent right now?"
"does it match the chain, within policy?"
Defense in depth: the cheap monitor flags it fast; the precise harness confirms, grades, and acts. Real systems check
the same critical invariant in more than one place, on purpose.
The full self-checking picture
The indexer guards itself three ways: chainref (does it match the chain? — audited + actuated, L29–33),
validation (does it obey its own rules? — monitored, here), and parity (does Go match Python batch? — L23).
Three lenses on "is the graph any good," each catching what the others can't.
Check yourself
1. What question does the validation suite answer, versus chainref?
2. Why are checks split into a fast tier and a slow tier with separate loops?
3. A validation check panics mid-cycle. What happens?
4. What does it mean that validation is a "monitor, not a control loop"?
5. Why does each check record a count every cycle, including 0 for passing checks?
6. B02 validates usd_value = quantity_raw × price / 10^decimals. What category of error does that catch?
7. Validation's B03 and chainref's BalanceConservation verifier check the same Σ HOLDS ≤ totalSupply invariant. Why have both?
8. A check's severity is graded (info / warn / error) rather than a pass/fail boolean. What does that buy?
↳ Ask your teacher
Try: "Show me S05's per-edge-type duplicate query and its memory cap." ·
"How does C06 detect a type downgrade (pool → token)?" ·
"Where is the Validator wired up — which binary runs it?" ·
"How do nil rdb / rpcPool make checks skip gracefully?" ·
"How does the /quality dashboard combine validation + chainref signals?"
What you can now do
State validation's question (internal coherence / own-rules) versus chainref's (chain truth) and parity's (Python match).
Read the numbered-check catalog (S / C / B / R) and the graded-severity Finding shape.
Explain the fast/slow tier split as cost-dictates-cadence, and why two independent loops matter.
Explain per-check isolation and how a panic/timeout becomes a first-class self-reported finding.
Explain monitor-vs-control-loop, the zero-count gauge habit, and the deliberate B03 ↔ chainref overlap.
Self-checking, fully mapped
With chainref (L29–33) and validation (here), you've seen both of the indexer's self-watching systems — one that audits
against the chain and acts, one that monitors its own coherence and reports. A production data system that's trusted with
billions doesn't assume it's correct; it continuously proves it, from several angles.