Lesson 23 · at_risk Lifecycle · Deeper Track

Trigger, ordinary paths, and proof

The three loose ends: what fires a run, the cells you haven't seen, and how it's all kept honest. ~14 min.

Builds on: L19 · L20 · L9 New: the scheduler New: HOLDS / vault paths New: the parity harness

You've seen how at_risk computes (L19–L22). Three operational questions remain: What kicks off a run? What do the boring, non-oracle cells look like? And how does anyone know the answer is right? They're the bookends of one lifecycle — trigger → compute → proof — so we'll do all three here and close the book on at_risk.

① TRIGGERscheduler · 30 min

→

② DECOMPOSE6 paths incl. HOLDS/vault

→

aggregate + capsL19

→

write-backMemgraph attrs + edges

→

③ PROVEparity diff + invariants

1What fires a run — the scheduler

AtRiskScheduler (at_risk_scheduler.go) drives the metric on a periodic cadence: load the full risk graph, run EnrichAtRisk over the live focus-token list, write the per-token at_risk_summary / at_risk_cells / at_risk_updated_at back to Memgraph. Default tick: 30 min.

A nuance worth correcting from L6/L16

L16's exposure-BFS was delta-driven — recompute only what a graph delta touched. The at_risk metric is different: its scheduler recomputes all focus tokens, full-graph, on a timer. Why? "at_risk consumes the full graph and stamps all focus tokens in one cycle — splitting into per-token wakes would require N full loads where we want one." The cadence (30 min) was picked from a measured 16.2-min cycle on a 1.5M-node stage graph (PR #474). There's an opt-in per-token partial-load mode (AT_RISK_PARTIAL_LOAD), but only after parity is verified. (Don't over-generalize "the risk engine is incremental" — at_risk's flagship cells are a periodic full recompute.)

Three properties tie straight back to earlier lessons:

Single-goroutine writer (L9). The Run() loop is the only caller, so there are never concurrent at_risk writes. Off by default (AT_RISK_SCHEDULER_ENABLED=false); cycles are sequential — a slow cycle delays the next tick, never stacks.
Fail-loop, don't fail-stop (L8). A bad cycle increments at_risk.cycle.errors at WARN and the loop continues; three consecutive failures tip the AtRiskSchedulerStalled alarm at 90 min (2× cadence) — observability you met in L11.
One heavy job at a time. HeavyMu / WithHeavy (heavy_gate.go) is a process-wide mutex — at_risk shares the box with centrality, DebtRank, and other heavy computers, so the gate serializes them instead of letting 7+ fan out and thrash memory.

2The ordinary cells — Path 3 & 3b (HOLDS / vault)

L20 showed the gnarliest path. These are the opposite: the plain majority. Path 3 (emitHoldsBasedCells) walks every venue that HOLDS the focus token and emits a cell, with target_role read straight off the venue's subtype:

switch vKind {
case "bridge": targetRole = TargetRoleBridge
case "pool":   targetRole = TargetRolePool
default:       targetRole = TargetRoleVault   // catch-all
}
// at_stake = focusUSD (USD of T this venue holds); + the usual admin & contract cells

The exclusion that defines at_risk's scope

Path 3 skips user / custody holders — EOAs, exchange hot wallets, ops multisigs holding tokens (userHolderSubtypes). The comment says it best: "a hack of Bitfinex hot wallet is not an at_risk attack surface on the token." at_risk measures risk in the protocol contracts that hold T (where a contract/admin compromise drains value systemically), not wherever tokens happen to sit. This single filter is the line between "protocol risk" and "someone got phished."

Path 3b (emitVaultAssetOnlyCells) is a small but instructive fix-up: MetaMorpho-style vaults dropped their (phantom) HOLDS edge in PR #168 but still carry a VAULT_ASSET edge. Path 3 would miss them, so 3b catches vaults reachable only via VAULT_ASSET, using vault.attrs.tvl_usd as the balance proxy and skipping any vault Path 3 already covered. Same admin + contract-failure cell pairing as everywhere else.

Why this matters for your mental model

Recall L19's "84,802 contract-failure cells vs 7,390 admin cells." The vast majority are these mundane HOLDS/vault cells — a venue holds T, so its contract failing puts T's value at stake. The oracle and lending paths (L20) are the spicy minority. Now you've seen both ends of the distribution.

3How it's kept honest — the parity harness

Every lesson since L6 has dropped phrases like "mirrors at_risk.py:2689," "any change here is a parity break," and the KeyedSum/sorted-iteration ulp discipline. Here's the machine behind those words. The entire Go at_risk engine is a port of a Python original (at_risk.py), and correctness is defined as matching Python's output. Two mechanisms enforce it:

A · The cross-implementation diff (`DiffEnrichedGraphs`)

Run both implementations on the same graph, compare per-token, field by field. USD floats compare within a tolerance; counts must match exactly:

USDRelativeTolerance: 1e-4   // 0.01% relative drift allowed on $ fields
USDAbsoluteTolerance: 1e-2   // or 1 cent absolute, whichever is kinder
// count fields (n_cells, n_anchor, …) — exact match required

This is exactly why the engine is so fussy about float determinism: a ulp of drift from an unsorted map walk or a naive sum could push a token past tolerance and fail the harness. cmd/at-risk-diff is the CLI front end; the tolerances track docs/at_risk_io_schema.md §9.3.

B · The 4 sanity invariants (`CheckInvariants`) — independent of Python

This is the satisfying capstone. Four invariants run against the Go output alone — and every one of them is a fact you learned in a previous lesson, now encoded as a machine-checked guarantee:

#	Invariant	You learned it in…
1	`extractable_usd ≤ at_stake_usd × 1.001`	L19 — the extractable≤at_stake cap (and the 0.1% slack).
2	no oracle cell with `outcome_class != "trigger"`	L20 — oracle attacks trigger a mispricing, a distinct outcome class.
3	no `role_class == "dos"` cell with `extractable > 0`	L20/L21 — a denial-of-service drains nothing; extractable must be 0.
4	no `deployer_fallback_no_admin` cell with `extractable > 0`	L19/L20 — the $8B deployer-fallback guard.

Parity + invariants = the whole quality story

The diff catches drift from Python (did the port change behaviour?); the invariants catch internal nonsense (did we emit something physically impossible?) without needing Python at all. Together they're why the risk engine is flagged "NOT a first-PR area" in your mission notes — the parity bar is unforgiving, and now you know precisely what bar that is.

at_risk: the book is closed

Trigger (scheduler, periodic full recompute, single writer) → decompose (six paths — oracle in L20, the HOLDS/vault majority here) → extractable + exit-liquidity ceiling (L20/L21) → multisig fan-out (L22) → value-pool collapse + caps (L19) → write-back → proven by parity diff + invariants. There is no part of at_risk left as a black box.

Check yourself

1. How is the at_risk metric (cells/summary) triggered?

2. Why full-graph-per-cycle instead of per-token incremental for at_risk?

3. The scheduler's Run() is the only caller that writes at_risk. Which earlier principle is that?

4. HeavyMu / WithHeavy exists because…

5. Path 3 (emitHoldsBasedCells) deliberately SKIPS EOAs, exchange hot wallets, and ops multisigs. Why?

6. What does Path 3b (emitVaultAssetOnlyCells) catch that Path 3 misses?

7. In DiffEnrichedGraphs, USD fields compare within a tolerance but count fields must match exactly. The tolerance is roughly…

8. CheckInvariants runs 4 checks against the Go output WITHOUT Python. One is "no dos cell with extractable > 0." What kind of guarantee is this?

9. Invariant #1 is extractable ≤ at_stake × 1.001. Where have you seen that exact rule before?

↳ Ask your teacher

Try: "Walk the scheduler step() function cycle by cycle." · "What's the difference between full-load and partial-load (AT_RISK_PARTIAL_LOAD)?" · "Show me writeBack's UNWIND batch." · "How does cmd/at-risk-diff get a Python reference graph to compare against?" · "Are there invariants beyond these 4 elsewhere (sanity.go)?"

What you can now do

Explain that at_risk's cells are a periodic full-graph recompute (scheduler, 30 min), distinct from L16's delta-driven exposure — and not over-generalize "the engine is incremental."
Connect the scheduler to single-writer (L9), fail-loop + alarms (L8/L11), and the HeavyMu serialization gate.
Describe Path 3 / 3b (HOLDS & VAULT_ASSET cells), and articulate the user/custody exclusion as the line between protocol risk and a phished wallet.
Explain the parity harness: DiffEnrichedGraphs (tolerance on $, exact on counts) and why float determinism matters to it.
Recite the 4 sanity invariants and trace each back to the lesson whose fact it encodes.

at_risk is exhaustively covered — the deep dive is complete

Across L6, L13, L16, and L19–L23 you now hold the entire at_risk subsystem end to end: what triggers it, all six construction paths, the cell anatomy, the extractable math and its exit-liquidity ceiling, multisig fan-out, the value-pool collapse + caps rollup, exposure propagation, the per-field math, and the parity + invariant machinery that proves it correct. Every number in the AT_RISK output is yours to derive.

← PreviousLesson 22 · Multisig Expansion · Deeper Track Next →Lesson 24 · Discovery / Enrichment Internals · New Subsystem

Grounded in: pkg/risk/at_risk_scheduler.go (periodic full-graph cadence 30 min, single-goroutine writer, fail-loop + AtRiskSchedulerStalled, AT_RISK_PARTIAL_LOAD, PR #474 16.2-min measure), heavy_gate.go (HeavyMu/WithHeavy), at_risk_cells.go (emitHoldsBasedCells Path 3 + user-holder exclusion, emitVaultAssetOnlyCells Path 3b / PR #168 / tvl_usd), at_risk_diff.go (DiffEnrichedGraphs tol 1e-4/1e-2, CheckInvariants 4 rules §9.4), cmd/at-risk-diff. Verify against source — the code is the truth.