Lesson 47 · Cross-Cutting Synthesis · Deeper Track

Why a single ulp matters

KeyedSum, ×1.001, round2, big.Int — one discipline, five lessons. ~13 min.

Synthesizes: L19 · L29 · L32 · L39 Anchor: a test green locally, red in CI New: exact-in, tolerant-out

Five lessons hit the same odd details: a sum routed through floats.KeyedSum, a comparison with a ×1.001 slack, outputs round2'd, balances summed in big.Int. In most software a 17th-decimal wobble is noise you'd never think about. Here it's load-bearing — and the reason ties together into one discipline you can state in a sentence: be exact where error accumulates, tolerant where you compare.

Your anchor: the test that's green locally and red in CI

Every engineer has hit it — a calculation passes on your machine, fails in CI, passes again on rerun. The usual culprit is non-determinism: the same inputs produce a different answer run-to-run, by a hair. In a system that values billions and must match a reference implementation bit-for-bit, that hair is the whole problem. This lesson is the codebase's defenses against it.

1 · The root cause: float math isn't associative, and maps aren't ordered

Two facts combine into a bug:

// (1) float64 addition is non-associative: rounding makes order matter.
(a + b) + c   !=   a + (b + c)        // differ in the last bit(s)
// (2) Go map iteration order is RANDOMIZED per run.
total := 0.0
for k := range someMap { total += someMap[k] }   // sums in a different order every run

Put them together and a plain += over a map produces a different last-ulp answer each run — from identical inputs. Most of the time nobody notices. In this system, three things make it matter.

2 · Why it's load-bearing here (not aesthetics)

Pressure	Why a ulp flips an outcome
Byte-parity vs Python (L23/L29)	the engine is a port; the diff harness compares to Python within `1e-4`/`1e-2` tolerance — but a drift that crosses a threshold or equality flips a token to "mismatch" and fails CI
Outputs feed decisions	a value is compared to a rule threshold (fire or not, L45), a cap (capped or not, L19), a changed-only guard (write or not, L38) — equality/threshold checks turn a ulp into a different action
Determinism = reproducibility (L8)	replay, debugging, and parity all assume same-inputs→same-output; a non-reproducible number can't be diffed, replayed, or trusted

The reframe: a flipped ulp isn't "slightly wrong"

It's wrong by being non-reproducible. A value that's off by 1e-15 but stable is fine — tolerances absorb it. A value that changes run-to-run can't be matched against Python, can't be replayed to the same state, and can flip a threshold inconsistently. The enemy isn't inaccuracy; it's non-determinism.

3 · The discipline — two complementary halves

Exact / deterministic where error ACCUMULATES

Summing many values is where order-dependence bites, so make it order-independent or exact.

floats.KeyedSum · big.Int · sorted iteration

Tolerant where you COMPARE

Comparing a result to a bound or another value is where you must not let float noise trip a flag.

×1.001 · 1e-4 / 1e-2 · scoreEpsilon

The exact-in half — the gallery

Tool	What it does	Seen in
`floats.KeyedSum`	sort the map keys, then Neumaier (compensated) sum — deterministic order and recovers lost low bits	at_risk rollup (L19), HHI numerator + denominator (L39), exit-liquidity total (L21)
`big.Int` sums	don't use float at all where you can stay exact; float only for the final coarse ratio	conservation Σ HOLDS (L32), the balance cache (L36)
sorted iteration	walk derivatives / rows in `sort.Strings` order before accumulating, for run-stable totals and JSON blobs	oracle-bridger derivatives, exit-liquidity bridge pass (L21/L27)

The tolerant-out half — the gallery

Slack	Purpose	Seen in
`capSlackTolerance = 1.001`	a cap "fired" only if the raw aggregate exceeds the bound by >0.1% — so rounding noise doesn't raise a spurious cap flag	at_risk caps (L19)
`1e-4` rel / `1e-2` abs	two USD values "match" if within tolerance — float drift below this isn't a parity failure	parity diff (L29), conservation band (L32)
`scoreEpsilon = 5e-5`	half the 4-dp rounding quantum: a recomputed score within it is "unchanged" → skip the write	node_risk_score changed-only writes (L38)

And the output quantum: round2 / round4

Outputs are rounded to a fixed number of decimals (2 for USD, 4 for ratios) before they're stored. That makes the stored value stable (a sub-quantum recompute lands on the same figure) and gives the tolerant-out slacks a clean grid to work against — scoreEpsilon is literally half the rounding quantum. Rounding is the bridge between the exact-in computation and the tolerant-out comparison.

4 · It's enforced, not trusted to memory

This discipline isn't a code-review hope — it's mechanized at both ends:

A lint guard forbids the footgun. A raw += accumulating over a map/slice fails lint unless it carries a // floats:ok (…deterministic order…) annotation justifying why the order is fixed. So you can't merge a non-deterministic sum without either using KeyedSum or proving the iteration order is stable.
The parity harness is the gate. DiffEnrichedGraphs (L29) runs the Go output against Python every cycle; a determinism slip that drifts a token past tolerance shows up as a mismatch finding. The discipline is tested, continuously, against the reference.

One sentence, five lessons

Be exact (or order-deterministic) where error accumulates; be tolerant where you compare; round to a fixed quantum in between; and let the linter and the parity harness enforce it. Every KeyedSum, every ×1.001, every big.Int sum, every round2 you saw was one move in that single discipline — the price of being a billions-valuing engine that must reproduce a reference bit-for-bit.

Check yourself

1. What's the root cause that makes a plain += over a Go map non-deterministic?

2. Why is a non-deterministic sum a real problem here, when in most apps it's ignorable noise?

3. The lesson reframes the danger as non-determinism, not inaccuracy. What follows from that?

4. What does floats.KeyedSum do, and why both parts?

5. Conservation (L32) sums HOLDS in big.Int but forms the final ratio in float64. Which half of the discipline is each?

6. The capSlackTolerance = 1.001 in the at_risk caps (L19) is which half of the discipline?

7. Why does the codebase round outputs to a fixed quantum (round2 / round4)?

8. How is the "use KeyedSum, not a raw +=" rule actually enforced?

↳ Ask your teacher

Try: "Show me a real // floats:ok annotation and what justifies it." · "What exactly is Neumaier summation vs naive Kahan?" · "Could integer/decimal types eliminate the whole problem — why not use them everywhere?" · "How does the parity harness pick the 1e-4 / 1e-2 tolerances?" · "Where would a determinism bug most plausibly still slip through today?"

What you can now do

Explain the root cause: non-associative float addition × randomized Go map order ⇒ non-deterministic sums.
Say why it's load-bearing here: byte-parity vs Python, outputs feeding threshold/equality checks, replay determinism.
State the discipline in one line — exact/deterministic where error accumulates, tolerant where you compare.
Place each tool: KeyedSum / big.Int / sorted iteration (in) vs ×1.001 / 1e-4 / scoreEpsilon (out), with round2 bridging.
Recognize the floats:ok lint guard and the parity harness as the enforcement, and spot a raw map-+= as a parity bug.

Two syntheses down

The combinator family (L46) and the float-determinism discipline (here) are the two big cross-cutting lenses on the risk engine: what to combine (collapse vs add, worst vs best) and how to combine it safely (deterministic, tolerant, enforced). Together they're most of what separates "reads the risk code" from "could change it without breaking parity."

← PreviousLesson 46 · Cross-Cutting Synthesis · Deeper Track Next →Cross-Cutting Synthesis · Deeper Track

Synthesizes code already cited in: pkg/floats (KeyedSum sorted-key Neumaier sum), at_risk_aggregate.go (KeyedSum + capSlackTolerance=1.001, L19), at_risk_diff.go (1e-4/1e-2 tolerances, L29), verify_balance_conservation.go (big.Int sum + float ratio, L32), vault_concentration.go (KeyedSum ×2 + floats:ok, L39), node_risk_score.go (round + scoreEpsilon=5e-5, L38), the floats:ok lint guard + parity harness gate. Verify against source — the code is the truth.