Lesson 47 · Cross-Cutting Synthesis · Deeper Track

Why a single ulp matters

KeyedSum, ×1.001, round2, big.Int — one discipline, five lessons. ~13 min.

Synthesizes: L19 · L29 · L32 · L39 Anchor: a test green locally, red in CI New: exact-in, tolerant-out

Five lessons hit the same odd details: a sum routed through floats.KeyedSum, a comparison with a ×1.001 slack, outputs round2'd, balances summed in big.Int. In most software a 17th-decimal wobble is noise you'd never think about. Here it's load-bearing — and the reason ties together into one discipline you can state in a sentence: be exact where error accumulates, tolerant where you compare.

Your anchor: the test that's green locally and red in CI
Every engineer has hit it — a calculation passes on your machine, fails in CI, passes again on rerun. The usual culprit is non-determinism: the same inputs produce a different answer run-to-run, by a hair. In a system that values billions and must match a reference implementation bit-for-bit, that hair is the whole problem. This lesson is the codebase's defenses against it.

1 · The root cause: float math isn't associative, and maps aren't ordered

Two facts combine into a bug:

// (1) float64 addition is non-associative: rounding makes order matter.
(a + b) + c   !=   a + (b + c)        // differ in the last bit(s)
// (2) Go map iteration order is RANDOMIZED per run.
total := 0.0
for k := range someMap { total += someMap[k] }   // sums in a different order every run

Put them together and a plain += over a map produces a different last-ulp answer each run — from identical inputs. Most of the time nobody notices. In this system, three things make it matter.

2 · Why it's load-bearing here (not aesthetics)

PressureWhy a ulp flips an outcome
Byte-parity vs Python (L23/L29)the engine is a port; the diff harness compares to Python within 1e-4/1e-2 tolerance — but a drift that crosses a threshold or equality flips a token to "mismatch" and fails CI
Outputs feed decisionsa value is compared to a rule threshold (fire or not, L45), a cap (capped or not, L19), a changed-only guard (write or not, L38) — equality/threshold checks turn a ulp into a different action
Determinism = reproducibility (L8)replay, debugging, and parity all assume same-inputs→same-output; a non-reproducible number can't be diffed, replayed, or trusted
The reframe: a flipped ulp isn't "slightly wrong"
It's wrong by being non-reproducible. A value that's off by 1e-15 but stable is fine — tolerances absorb it. A value that changes run-to-run can't be matched against Python, can't be replayed to the same state, and can flip a threshold inconsistently. The enemy isn't inaccuracy; it's non-determinism.

3 · The discipline — two complementary halves

Exact / deterministic where error ACCUMULATES

Summing many values is where order-dependence bites, so make it order-independent or exact.

floats.KeyedSum · big.Int · sorted iteration

Tolerant where you COMPARE

Comparing a result to a bound or another value is where you must not let float noise trip a flag.

×1.001 · 1e-4 / 1e-2 · scoreEpsilon

The exact-in half — the gallery

ToolWhat it doesSeen in
floats.KeyedSumsort the map keys, then Neumaier (compensated) sum — deterministic order and recovers lost low bitsat_risk rollup (L19), HHI numerator + denominator (L39), exit-liquidity total (L21)
big.Int sumsdon't use float at all where you can stay exact; float only for the final coarse ratioconservation Σ HOLDS (L32), the balance cache (L36)
sorted iterationwalk derivatives / rows in sort.Strings order before accumulating, for run-stable totals and JSON blobsoracle-bridger derivatives, exit-liquidity bridge pass (L21/L27)

The tolerant-out half — the gallery

SlackPurposeSeen in
capSlackTolerance = 1.001a cap "fired" only if the raw aggregate exceeds the bound by >0.1% — so rounding noise doesn't raise a spurious cap flagat_risk caps (L19)
1e-4 rel / 1e-2 abstwo USD values "match" if within tolerance — float drift below this isn't a parity failureparity diff (L29), conservation band (L32)
scoreEpsilon = 5e-5half the 4-dp rounding quantum: a recomputed score within it is "unchanged" → skip the writenode_risk_score changed-only writes (L38)
And the output quantum: round2 / round4
Outputs are rounded to a fixed number of decimals (2 for USD, 4 for ratios) before they're stored. That makes the stored value stable (a sub-quantum recompute lands on the same figure) and gives the tolerant-out slacks a clean grid to work against — scoreEpsilon is literally half the rounding quantum. Rounding is the bridge between the exact-in computation and the tolerant-out comparison.

4 · It's enforced, not trusted to memory

This discipline isn't a code-review hope — it's mechanized at both ends:

One sentence, five lessons
Be exact (or order-deterministic) where error accumulates; be tolerant where you compare; round to a fixed quantum in between; and let the linter and the parity harness enforce it. Every KeyedSum, every ×1.001, every big.Int sum, every round2 you saw was one move in that single discipline — the price of being a billions-valuing engine that must reproduce a reference bit-for-bit.

Check yourself

1. What's the root cause that makes a plain += over a Go map non-deterministic?
2. Why is a non-deterministic sum a real problem here, when in most apps it's ignorable noise?
3. The lesson reframes the danger as non-determinism, not inaccuracy. What follows from that?
4. What does floats.KeyedSum do, and why both parts?
5. Conservation (L32) sums HOLDS in big.Int but forms the final ratio in float64. Which half of the discipline is each?
6. The capSlackTolerance = 1.001 in the at_risk caps (L19) is which half of the discipline?
7. Why does the codebase round outputs to a fixed quantum (round2 / round4)?
8. How is the "use KeyedSum, not a raw +=" rule actually enforced?
↳ Ask your teacher
Try: "Show me a real // floats:ok annotation and what justifies it." · "What exactly is Neumaier summation vs naive Kahan?" · "Could integer/decimal types eliminate the whole problem — why not use them everywhere?" · "How does the parity harness pick the 1e-4 / 1e-2 tolerances?" · "Where would a determinism bug most plausibly still slip through today?"

What you can now do

Two syntheses down
The combinator family (L46) and the float-determinism discipline (here) are the two big cross-cutting lenses on the risk engine: what to combine (collapse vs add, worst vs best) and how to combine it safely (deterministic, tolerant, enforced). Together they're most of what separates "reads the risk code" from "could change it without breaking parity."

Synthesizes code already cited in: pkg/floats (KeyedSum sorted-key Neumaier sum), at_risk_aggregate.go (KeyedSum + capSlackTolerance=1.001, L19), at_risk_diff.go (1e-4/1e-2 tolerances, L29), verify_balance_conservation.go (big.Int sum + float ratio, L32), vault_concentration.go (KeyedSum ×2 + floats:ok, L39), node_risk_score.go (round + scoreEpsilon=5e-5, L38), the floats:ok lint guard + parity harness gate. Verify against source — the code is the truth.