Lesson 45 · The Last Mechanism · Deeper Track

Fire once, not every cycle

The debounce-and-hysteresis state machine behind every alert. ~12 min.

Builds on: L12 · L15 · L33 Anchor: alert fatigue & flapping New: the 3-state firing machine New: cooldown does two jobs

L12 said a rule fires an alert when its condition is met, mediated by a "firing state machine that debounces" — and moved on. This is that machine, and it's the last substantive mechanism in the codebase. The problem it solves is one every on-call engineer feels in their bones: how do you turn "the condition is true right now" — evaluated every cycle — into the right number of alerts?

Your anchor: the two ways naive alerting fails

Evaluate a rule every cycle and alert whenever it's true, and you get one of two miseries. Spam: a genuine, persistent breach (a risk score over threshold for hours) fires an identical alert every cycle until someone mutes it. Flapping: a value hovering right at the threshold crosses back and forth, firing and resolving over and over. The firing state machine exists to do neither — fire once on entry, remind on a cooldown, and wait a grace period before calling it resolved.

1 · The three-state machine (threshold rules)

ApplyFiringState (rules/firing.go) tracks a (rule, node) pair through three states, fed each cycle by one boolean — conditionMet:

CLEAR

met→ emit

ACTIVE

cleared→

COOLDOWN

expired+met→ emit

ACTIVE

From	Trigger	To	Alert?
CLEAR	condition met	ACTIVE	emit (record FiredAt + LastAlertAt)
ACTIVE	still met + cooldown elapsed since LastAlertAt	ACTIVE	re-emit (a reminder)
ACTIVE	condition cleared	COOLDOWN	no — enter resolve grace
COOLDOWN	cooldown expired + still clear	CLEAR	no
COOLDOWN	cooldown expired + met again	ACTIVE	emit

One window, two jobs

The same CooldownSec does double duty. While ACTIVE it's repeat-alert suppression — you get a reminder only every cooldown, not every cycle, so a 4-hour breach with a 1-hour cooldown alerts ~4 times, not 480. After a clear it's resolve hysteresis — the machine waits the cooldown in COOLDOWN before declaring CLEAR, so a value flapping across the threshold within that window doesn't thrash. Anti-spam and anti-flap from a single knob.

2 · The re-emit gate (and why cooldown=0 means "fire once")

The ACTIVE re-emit has two guards worth reading — both protect against a stampede:

case StateActive:
    if !conditionMet { … return nil }       // cleared → COOLDOWN
    if rule.CooldownSec <= 0 { return nil }   // no cooldown ⇒ fire-once-per-breach
    if fs.LastAlertAt == "" { fs.LastAlertAt = now; … return nil }  // legacy-state grace cycle
    if cooldownExpiredAt(fs.LastAlertAt, rule.CooldownSec) { … return alert }   // reminder due

CooldownSec <= 0 disables re-emit — a rule with no configured cooldown fires exactly once per breach. (Without this guard, cooldownExpiredAt returns true unconditionally and you'd alert every eval cycle — spam.)
The legacy-state grace cycle — states written before LastAlertAt existed have it empty; the machine populates it without emitting, anchoring the cooldown to the first post-upgrade eval. Otherwise an ancient FiredAt would make every old breach look "overdue" and fire a thundering herd of reminders on the first cycle after deploy. A migration that refuses to stampede.

3 · Events are different — no ACTIVE state

Threshold rules describe a sustained condition (a score stays high), so they have an ACTIVE state. Event rules — admin_change, proxy_upgrade, token_mint — are instantaneous: the thing happened, there's no "still happening." So ApplyEventFiringState has no ACTIVE state at all:

CLEAR → (event)  →  emit, go straight to COOLDOWN
COOLDOWN → (event + cooldown expired)  →  emit again

The shape follows the semantics

A threshold breach is a state you occupy (ACTIVE) and eventually leave; an event is a point in time. So the two machines differ exactly where the semantics do: thresholds get a sustained ACTIVE phase with resolve-grace; events fire on the spot and only use cooldown to de-dupe a burst of the same event. Same cooldown primitive, different topology.

4 · The alert it builds — and the taxonomy

When the machine decides to emit, buildRuleAlert assembles the AlertEvent the alert processor (L15) consumes: rule id/name/type, portfolio, severity, node, scope/view, and the details map. resolveAlertType maps the rule to a downstream alert_type taxonomy — event triggers become admin_change / proxy_upgrade / token_mint / new_edge (firewall rules use the detection module), and everything else is risk_limit_breach. And a hard cap, maxAlertsPerRule = 50 per eval cycle, is the last anti-flood backstop — the same drip-don't-flood discipline as L31's write budget and L33's per-run cap.

Where this sits

L12 evaluates the rule (is the condition met for this node?); this machine decides whether that produces an alert this cycle; L15's processor then dedups, stores, and delivers it. Three stages, each refusing in its own way to bother a human more than necessary — debounce here, dedup-by-msg-id there, streak-to-ticket in the quality harness (L29/L33). Restraint is a system-wide value, not a one-off.

Check yourself

1. What problem does the firing state machine exist to solve?

2. A threshold rule's condition has been met continuously for hours, with a 1-hour cooldown. Roughly how often does it alert?

3. The same CooldownSec serves two purposes. What are they?

4. Why does the COOLDOWN state exist between ACTIVE and CLEAR rather than going straight to CLEAR on a clear?

5. A rule has CooldownSec <= 0. What's its firing behavior?

6. On the first cycle after a deploy that added LastAlertAt, a long-active breach has it empty. What happens?

7. Why does ApplyEventFiringState have no ACTIVE state?

8. maxAlertsPerRule = 50 per eval cycle is which kind of safeguard?

↳ Ask your teacher

Try: "Where is the per-(rule, node) FiringState stored, and how is it keyed?" · "How does the engine decide conditionMet — the field+op+value eval (L12)?" · "What does the alert processor (L15) do with the AlertEvent next?" · "Could two eval cycles race on the same FiringState, and what guards it?" · "How is a resolve (ACTIVE→CLEAR) surfaced to the user, if at all?"

What you can now do

Trace a (rule, node) through CLEAR → ACTIVE → COOLDOWN → CLEAR/ACTIVE and say where each alert fires.
Explain the cooldown's dual role: repeat-alert suppression while ACTIVE, resolve-hysteresis in COOLDOWN.
Explain why CooldownSec <= 0 means fire-once, and what the legacy-state grace cycle prevents.
Contrast the event machine (no ACTIVE) with the threshold machine, and why the topology follows the semantics.
Describe the AlertEvent built, the alert_type taxonomy, and the per-rule cap as anti-flood.

The deep-understanding journey: complete

This was the last substantive mechanism. Across 45 lessons you've gone from a raw Transfer log to a graph edge, through enrichment and discovery, the at_risk engine and its every parameter, the streaming and single-writer and self-checking scaffolding, billing, the coordination primitives, and the Cypher and Go idioms underneath — and now the alerting restraint at the consumer tail. You set out to understand risk-graph-indexer deeply, end to end, before contributing. Mission accomplished — the whole machine, opened.

← PreviousLesson 44 · Hot-Path Mechanisms · Deeper Track Next →Lesson 46 · Cross-Cutting Synthesis · Deeper Track

Grounded in: pkg/rules/firing.go (ApplyFiringState CLEAR/ACTIVE/COOLDOWN transitions, dual-purpose CooldownSec, CooldownSec<=0 fire-once, LastAlertAt=="" legacy grace cycle, cooldownExpiredAt; ApplyEventFiringState no-ACTIVE event variant; buildRuleAlert → AlertEvent; resolveAlertType taxonomy; maxAlertsPerRule=50). Feeds the alert processor (L15). Verify against source — the code is the truth.