Lesson 23 · at_risk Lifecycle · Deeper Track

Trigger, ordinary paths, and proof

The three loose ends: what fires a run, the cells you haven't seen, and how it's all kept honest. ~14 min.

Builds on: L19 · L20 · L9 New: the scheduler New: HOLDS / vault paths New: the parity harness

You've seen how at_risk computes (L19–L22). Three operational questions remain: What kicks off a run? What do the boring, non-oracle cells look like? And how does anyone know the answer is right? They're the bookends of one lifecycle — trigger → compute → proof — so we'll do all three here and close the book on at_risk.

① TRIGGERscheduler · 30 min
② DECOMPOSE6 paths incl. HOLDS/vault
aggregate + capsL19
write-backMemgraph attrs + edges
③ PROVEparity diff + invariants

1What fires a run — the scheduler

AtRiskScheduler (at_risk_scheduler.go) drives the metric on a periodic cadence: load the full risk graph, run EnrichAtRisk over the live focus-token list, write the per-token at_risk_summary / at_risk_cells / at_risk_updated_at back to Memgraph. Default tick: 30 min.

A nuance worth correcting from L6/L16
L16's exposure-BFS was delta-driven — recompute only what a graph delta touched. The at_risk metric is different: its scheduler recomputes all focus tokens, full-graph, on a timer. Why? "at_risk consumes the full graph and stamps all focus tokens in one cycle — splitting into per-token wakes would require N full loads where we want one." The cadence (30 min) was picked from a measured 16.2-min cycle on a 1.5M-node stage graph (PR #474). There's an opt-in per-token partial-load mode (AT_RISK_PARTIAL_LOAD), but only after parity is verified. (Don't over-generalize "the risk engine is incremental" — at_risk's flagship cells are a periodic full recompute.)

Three properties tie straight back to earlier lessons:

2The ordinary cells — Path 3 & 3b (HOLDS / vault)

L20 showed the gnarliest path. These are the opposite: the plain majority. Path 3 (emitHoldsBasedCells) walks every venue that HOLDS the focus token and emits a cell, with target_role read straight off the venue's subtype:

switch vKind {
case "bridge": targetRole = TargetRoleBridge
case "pool":   targetRole = TargetRolePool
default:       targetRole = TargetRoleVault   // catch-all
}
// at_stake = focusUSD (USD of T this venue holds); + the usual admin & contract cells
The exclusion that defines at_risk's scope
Path 3 skips user / custody holders — EOAs, exchange hot wallets, ops multisigs holding tokens (userHolderSubtypes). The comment says it best: "a hack of Bitfinex hot wallet is not an at_risk attack surface on the token." at_risk measures risk in the protocol contracts that hold T (where a contract/admin compromise drains value systemically), not wherever tokens happen to sit. This single filter is the line between "protocol risk" and "someone got phished."

Path 3b (emitVaultAssetOnlyCells) is a small but instructive fix-up: MetaMorpho-style vaults dropped their (phantom) HOLDS edge in PR #168 but still carry a VAULT_ASSET edge. Path 3 would miss them, so 3b catches vaults reachable only via VAULT_ASSET, using vault.attrs.tvl_usd as the balance proxy and skipping any vault Path 3 already covered. Same admin + contract-failure cell pairing as everywhere else.

Why this matters for your mental model
Recall L19's "84,802 contract-failure cells vs 7,390 admin cells." The vast majority are these mundane HOLDS/vault cells — a venue holds T, so its contract failing puts T's value at stake. The oracle and lending paths (L20) are the spicy minority. Now you've seen both ends of the distribution.

3How it's kept honest — the parity harness

Every lesson since L6 has dropped phrases like "mirrors at_risk.py:2689," "any change here is a parity break," and the KeyedSum/sorted-iteration ulp discipline. Here's the machine behind those words. The entire Go at_risk engine is a port of a Python original (at_risk.py), and correctness is defined as matching Python's output. Two mechanisms enforce it:

A · The cross-implementation diff (DiffEnrichedGraphs)

Run both implementations on the same graph, compare per-token, field by field. USD floats compare within a tolerance; counts must match exactly:

USDRelativeTolerance: 1e-4   // 0.01% relative drift allowed on $ fields
USDAbsoluteTolerance: 1e-2   // or 1 cent absolute, whichever is kinder
// count fields (n_cells, n_anchor, …) — exact match required

This is exactly why the engine is so fussy about float determinism: a ulp of drift from an unsorted map walk or a naive sum could push a token past tolerance and fail the harness. cmd/at-risk-diff is the CLI front end; the tolerances track docs/at_risk_io_schema.md §9.3.

B · The 4 sanity invariants (CheckInvariants) — independent of Python

This is the satisfying capstone. Four invariants run against the Go output alone — and every one of them is a fact you learned in a previous lesson, now encoded as a machine-checked guarantee:

#InvariantYou learned it in…
1extractable_usd ≤ at_stake_usd × 1.001L19 — the extractable≤at_stake cap (and the 0.1% slack).
2no oracle cell with outcome_class != "trigger"L20 — oracle attacks trigger a mispricing, a distinct outcome class.
3no role_class == "dos" cell with extractable > 0L20/L21 — a denial-of-service drains nothing; extractable must be 0.
4no deployer_fallback_no_admin cell with extractable > 0L19/L20 — the $8B deployer-fallback guard.
Parity + invariants = the whole quality story
The diff catches drift from Python (did the port change behaviour?); the invariants catch internal nonsense (did we emit something physically impossible?) without needing Python at all. Together they're why the risk engine is flagged "NOT a first-PR area" in your mission notes — the parity bar is unforgiving, and now you know precisely what bar that is.
at_risk: the book is closed
Trigger (scheduler, periodic full recompute, single writer) → decompose (six paths — oracle in L20, the HOLDS/vault majority here) → extractable + exit-liquidity ceiling (L20/L21) → multisig fan-out (L22) → value-pool collapse + caps (L19) → write-back → proven by parity diff + invariants. There is no part of at_risk left as a black box.

Check yourself

1. How is the at_risk metric (cells/summary) triggered?
2. Why full-graph-per-cycle instead of per-token incremental for at_risk?
3. The scheduler's Run() is the only caller that writes at_risk. Which earlier principle is that?
4. HeavyMu / WithHeavy exists because…
5. Path 3 (emitHoldsBasedCells) deliberately SKIPS EOAs, exchange hot wallets, and ops multisigs. Why?
6. What does Path 3b (emitVaultAssetOnlyCells) catch that Path 3 misses?
7. In DiffEnrichedGraphs, USD fields compare within a tolerance but count fields must match exactly. The tolerance is roughly…
8. CheckInvariants runs 4 checks against the Go output WITHOUT Python. One is "no dos cell with extractable > 0." What kind of guarantee is this?
9. Invariant #1 is extractable ≤ at_stake × 1.001. Where have you seen that exact rule before?
↳ Ask your teacher
Try: "Walk the scheduler step() function cycle by cycle." · "What's the difference between full-load and partial-load (AT_RISK_PARTIAL_LOAD)?" · "Show me writeBack's UNWIND batch." · "How does cmd/at-risk-diff get a Python reference graph to compare against?" · "Are there invariants beyond these 4 elsewhere (sanity.go)?"

What you can now do

at_risk is exhaustively covered — the deep dive is complete
Across L6, L13, L16, and L19–L23 you now hold the entire at_risk subsystem end to end: what triggers it, all six construction paths, the cell anatomy, the extractable math and its exit-liquidity ceiling, multisig fan-out, the value-pool collapse + caps rollup, exposure propagation, the per-field math, and the parity + invariant machinery that proves it correct. Every number in the AT_RISK output is yours to derive.

Grounded in: pkg/risk/at_risk_scheduler.go (periodic full-graph cadence 30 min, single-goroutine writer, fail-loop + AtRiskSchedulerStalled, AT_RISK_PARTIAL_LOAD, PR #474 16.2-min measure), heavy_gate.go (HeavyMu/WithHeavy), at_risk_cells.go (emitHoldsBasedCells Path 3 + user-holder exclusion, emitVaultAssetOnlyCells Path 3b / PR #168 / tvl_usd), at_risk_diff.go (DiffEnrichedGraphs tol 1e-4/1e-2, CheckInvariants 4 rules §9.4), cmd/at-risk-diff. Verify against source — the code is the truth.