Lesson 45 · The Last Mechanism · Deeper Track

Fire once, not every cycle

The debounce-and-hysteresis state machine behind every alert. ~12 min.

Builds on: L12 · L15 · L33 Anchor: alert fatigue & flapping New: the 3-state firing machine New: cooldown does two jobs

L12 said a rule fires an alert when its condition is met, mediated by a "firing state machine that debounces" — and moved on. This is that machine, and it's the last substantive mechanism in the codebase. The problem it solves is one every on-call engineer feels in their bones: how do you turn "the condition is true right now" — evaluated every cycle — into the right number of alerts?

Your anchor: the two ways naive alerting fails
Evaluate a rule every cycle and alert whenever it's true, and you get one of two miseries. Spam: a genuine, persistent breach (a risk score over threshold for hours) fires an identical alert every cycle until someone mutes it. Flapping: a value hovering right at the threshold crosses back and forth, firing and resolving over and over. The firing state machine exists to do neither — fire once on entry, remind on a cooldown, and wait a grace period before calling it resolved.

1 · The three-state machine (threshold rules)

ApplyFiringState (rules/firing.go) tracks a (rule, node) pair through three states, fed each cycle by one boolean — conditionMet:

CLEAR
met→ emit
ACTIVE
cleared
COOLDOWN
expired+met→ emit
ACTIVE
FromTriggerToAlert?
CLEARcondition metACTIVEemit (record FiredAt + LastAlertAt)
ACTIVEstill met + cooldown elapsed since LastAlertAtACTIVEre-emit (a reminder)
ACTIVEcondition clearedCOOLDOWNno — enter resolve grace
COOLDOWNcooldown expired + still clearCLEARno
COOLDOWNcooldown expired + met againACTIVEemit
One window, two jobs
The same CooldownSec does double duty. While ACTIVE it's repeat-alert suppression — you get a reminder only every cooldown, not every cycle, so a 4-hour breach with a 1-hour cooldown alerts ~4 times, not 480. After a clear it's resolve hysteresis — the machine waits the cooldown in COOLDOWN before declaring CLEAR, so a value flapping across the threshold within that window doesn't thrash. Anti-spam and anti-flap from a single knob.

2 · The re-emit gate (and why cooldown=0 means "fire once")

The ACTIVE re-emit has two guards worth reading — both protect against a stampede:

case StateActive:
    if !conditionMet { … return nil }       // cleared → COOLDOWN
    if rule.CooldownSec <= 0 { return nil }   // no cooldown ⇒ fire-once-per-breach
    if fs.LastAlertAt == "" { fs.LastAlertAt = now; … return nil }  // legacy-state grace cycle
    if cooldownExpiredAt(fs.LastAlertAt, rule.CooldownSec) { … return alert }   // reminder due

3 · Events are different — no ACTIVE state

Threshold rules describe a sustained condition (a score stays high), so they have an ACTIVE state. Event rules — admin_change, proxy_upgrade, token_mint — are instantaneous: the thing happened, there's no "still happening." So ApplyEventFiringState has no ACTIVE state at all:

CLEAR → (event)  →  emit, go straight to COOLDOWN
COOLDOWN → (event + cooldown expired)  →  emit again
The shape follows the semantics
A threshold breach is a state you occupy (ACTIVE) and eventually leave; an event is a point in time. So the two machines differ exactly where the semantics do: thresholds get a sustained ACTIVE phase with resolve-grace; events fire on the spot and only use cooldown to de-dupe a burst of the same event. Same cooldown primitive, different topology.

4 · The alert it builds — and the taxonomy

When the machine decides to emit, buildRuleAlert assembles the AlertEvent the alert processor (L15) consumes: rule id/name/type, portfolio, severity, node, scope/view, and the details map. resolveAlertType maps the rule to a downstream alert_type taxonomy — event triggers become admin_change / proxy_upgrade / token_mint / new_edge (firewall rules use the detection module), and everything else is risk_limit_breach. And a hard cap, maxAlertsPerRule = 50 per eval cycle, is the last anti-flood backstop — the same drip-don't-flood discipline as L31's write budget and L33's per-run cap.

Where this sits
L12 evaluates the rule (is the condition met for this node?); this machine decides whether that produces an alert this cycle; L15's processor then dedups, stores, and delivers it. Three stages, each refusing in its own way to bother a human more than necessary — debounce here, dedup-by-msg-id there, streak-to-ticket in the quality harness (L29/L33). Restraint is a system-wide value, not a one-off.

Check yourself

1. What problem does the firing state machine exist to solve?
2. A threshold rule's condition has been met continuously for hours, with a 1-hour cooldown. Roughly how often does it alert?
3. The same CooldownSec serves two purposes. What are they?
4. Why does the COOLDOWN state exist between ACTIVE and CLEAR rather than going straight to CLEAR on a clear?
5. A rule has CooldownSec <= 0. What's its firing behavior?
6. On the first cycle after a deploy that added LastAlertAt, a long-active breach has it empty. What happens?
7. Why does ApplyEventFiringState have no ACTIVE state?
8. maxAlertsPerRule = 50 per eval cycle is which kind of safeguard?
↳ Ask your teacher
Try: "Where is the per-(rule, node) FiringState stored, and how is it keyed?" · "How does the engine decide conditionMet — the field+op+value eval (L12)?" · "What does the alert processor (L15) do with the AlertEvent next?" · "Could two eval cycles race on the same FiringState, and what guards it?" · "How is a resolve (ACTIVE→CLEAR) surfaced to the user, if at all?"

What you can now do

The deep-understanding journey: complete
This was the last substantive mechanism. Across 45 lessons you've gone from a raw Transfer log to a graph edge, through enrichment and discovery, the at_risk engine and its every parameter, the streaming and single-writer and self-checking scaffolding, billing, the coordination primitives, and the Cypher and Go idioms underneath — and now the alerting restraint at the consumer tail. You set out to understand risk-graph-indexer deeply, end to end, before contributing. Mission accomplished — the whole machine, opened.

Grounded in: pkg/rules/firing.go (ApplyFiringState CLEAR/ACTIVE/COOLDOWN transitions, dual-purpose CooldownSec, CooldownSec<=0 fire-once, LastAlertAt=="" legacy grace cycle, cooldownExpiredAt; ApplyEventFiringState no-ACTIVE event variant; buildRuleAlertAlertEvent; resolveAlertType taxonomy; maxAlertsPerRule=50). Feeds the alert processor (L15). Verify against source — the code is the truth.