Lesson 15 · The Alert Processor · Deeper Track

The last hop: alert → human

Delivering a fired alert exactly once — and the same patterns, one more time. ~11 min.

Builds on: L12 · L7 Closes the product loop New: dedup by message ID New: fail-loud ops

The rule engine (L12) emits an AlertEvent. The alert-processor (cmd/alert-processor, pkg/alertprocessor) is the final hop: it persists that alert and delivers it to a human — reliably, and exactly once even when the machinery retries. The delightful thing is you'll recognise almost every mechanism. The system has a small set of patterns and reuses them; this lesson is the proof.

1 · The pipeline (you've seen its shape before)

rule-engine

emits AlertEvent

— XADD →

Redis AlertStream

consumer group

— XREADGROUP →

alert-processor

worker pool

→

OpenSearch + notify

store + deliver

Alerts travel the same way blocks do: a Redis stream + consumer group, consumed at-least-once with an XACK after success — exactly Lesson 7's backbone, reused for the alert path:

// pkg/alertprocessor/processor.go — Start
ap.rdb.XGroupCreateMkStream(ctx, cfg.AlertStream, cfg.AlertConsumerGroup, "0")
// tolerate BUSYGROUP (group already exists) — idempotent setup

🔗 One messaging idiom, reused everywhere

Blocks (L7) and alerts (here) both flow through Redis streams + consumer groups + at-least-once + ACK. The team didn't invent a second delivery mechanism — they reused the one they trust. Recognising a codebase's small vocabulary of patterns is how you read a big system quickly: once you know the streaming idiom, you already understand the alert transport.

Worker-pool fan-out (a clean Go concurrency pattern)

One readLoop goroutine pulls messages and feeds a channel; a pool of worker goroutines drains it concurrently:

// readLoop → msgCh → N workers
msgCh := make(chan redis.XMessage)
go ap.readLoop(ctx, msgCh)            // producer: XREADGROUP → channel
for i := 0; i < numWorkers; i++ {
    go ap.worker(ctx, i, msgCh)        // consumers: process concurrently
}

This is the canonical Go "fan-out": one reader, many workers sharing a channel. It parallelises delivery (notifications can be slow) while the stream's consumer group guarantees each message is handled once.

2 · processMessage — four steps

Each worker runs the same four steps (processor.go processMessage):

Parse the AlertEvent from the stream message.

Persist to OpenSearch — using the stream message ID as the document ID (this is the dedup trick, below). Mark Delivered = false first.

Notify — notifyClient.Send(ctx, event) delivers it (webhook/channel) and returns whether delivery succeeded.

Update + ACK — record Delivered = true/false, then XACK the stream message.

3 · Exactly-once delivery from at-least-once transport ⭐

Here's the elegant part. The stream is at-least-once — a crash before XACK means the alert is redelivered (L7). Naïvely that double-notifies the customer. The fix is the same idempotency move you saw in Lesson 4, now applied to alerts:

Dedup by stable document ID

The OpenSearch doc ID is the Redis stream message ID. So a redelivered alert writes to the same doc — an idempotent upsert, not a duplicate row. The customer's alert record is created once regardless of how many times the message is delivered. At-least-once transport + a stable idempotency key = effectively-once delivery. (Same recipe as the write path's MERGE + the graphwrite IdemKey from L9.)

The Delivered flag closes the reliability gap

Persist happens before notify, with Delivered=false. If the notification send fails, the alert is still recorded (just marked undelivered) — never lost. The status field makes delivery observable and retryable: you can query OpenSearch for Delivered=false and re-send. Persist-then-deliver is the same "durable state first" discipline as the cursor commit (L4) and the firing state (L12).

4 · A masterclass in fail-loud ops ⭐

This subsystem contains one of the best defensive-ops decisions in the repo, and it's worth studying (cmd/alert-processor/main.go ensureAlertIndexOrFatal). On startup it ensures the OpenSearch index mapping — and if that fails, it deliberately FATAL-crashes the pod rather than warning and continuing. Why?

The silent-failure trap it avoids (FORTA-2744)

If the view/scope fields aren't mapped as keyword, OpenSearch dynamically maps them as analyzed text on first write. Then every term filter on view/scope silently returns zero hits — forever, with no error surfaced to the API caller. Customers' alert filters would just quietly show nothing. So the code chooses to crash loudly (a CrashLoopBackOff pages a human) rather than run while silently dropping all filtering.

🔗 The third sighting of one principle

This is the same instinct as L8's watchdog pause and L9's DLQ + halt: stop loudly rather than proceed silently wrong. A crash that pages you is a feature when the alternative is invisible data loss. Across the whole system, the rule is consistent — fail loud, never corrupt silently. Spot this principle and you can predict how the team will handle any new failure mode.

5 · The product loop, finally closed 🔁

You can now trace a single fact from chain to human with no gaps:

on-chain event            (L1)
  → decoded                 (L3)
  → filtered + written      (L4)   ← monitored set, atomic commit
  → enriched + discovered   (L5)
  → risk fields computed    (L6·L13)   ← max / noisy-OR / HHI
  → rule condition met      (L12)  ← firing state machine emits AlertEvent
  → AlertStream             (L7 idiom)
  → persisted (dedup) +
    delivered to a human     (L15)   ← exactly-once, fail-loud

That is the entire system, end to end. Every box you can now open.

Check yourself

1. How does an AlertEvent get from the rule-engine to the alert-processor?

2. The stream is at-least-once, so an alert can be redelivered. How is the customer alerted only once?

3. Why persist the alert (Delivered=false) before attempting the notification?

4. Why does the alert-processor FATAL-crash on a bad OpenSearch index mapping instead of warning?

5. The "fail loud, don't proceed silently wrong" decision here is the same principle as…

6. The readLoop → msgCh → worker pool is which Go pattern?

7. The dedup-by-message-ID trick is the same idea as which earlier mechanisms?

8. What does the Delivered status field enable operationally?

↳ Ask your teacher

Try: "Show me notification.Client.Send — what channels exist?" · "How are undelivered alerts retried?" · "What fields are on an AlertEvent?" · "Walk me through the OpenSearch index mapping for alerts."

What you can now do

Trace an alert from the rule-engine through the Redis AlertStream to delivery — and recognise it as L7's idiom reused.
Explain how at-least-once transport + dedup-by-message-ID yields effectively-once alerting.
Explain persist-then-deliver and the Delivered flag for no-loss, retryable delivery.
Articulate the fail-loud-on-misconfig decision and connect it to L8/L9's "stop, don't corrupt."
Read the worker-pool fan-out Go pattern.

🏁 The whole product loop, closed

From an RPC block to a customer's pager, there is no longer a single box you can't open. You've covered all three core binaries, the streaming backbone, failure/recovery, the single-writer topology, bootstrap, observability, the rule engine, the risk math, the read surface, and now alert delivery. You understand this system the way its authors do.

← PreviousLesson 14 · The Read Surface · Deeper Track Next →Lesson 16 · Exposure-BFS · Deeper Track

Grounded in: pkg/alertprocessor/processor.go (Start: XGroupCreateMkStream; readLoop→msgCh→worker fan-out; processMessage 4 steps; dedup by stream msg ID; Delivered flag), cmd/alert-processor/main.go (ensureAlertIndexOrFatal — fail-loud index guard, FORTA-2744), pkg/alertprocessor/opensearchstore.go (IndexAlert), pkg/notification (delivery). Verify against source — the code is the truth.