Delivering a fired alert exactly once — and the same patterns, one more time. ~11 min.
Builds on: L12 · L7Closes the product loopNew: dedup by message IDNew: fail-loud ops
The rule engine (L12) emits an AlertEvent. The
alert-processor (cmd/alert-processor, pkg/alertprocessor)
is the final hop: it persists that alert and delivers it to a human — reliably, and exactly once even when
the machinery retries. The delightful thing is you'll recognise almost every mechanism. The system has a small set
of patterns and reuses them; this lesson is the proof.
1 · The pipeline (you've seen its shape before)
rule-engine
emits AlertEvent
— XADD →
Redis AlertStream
consumer group
— XREADGROUP →
alert-processor
worker pool
→
OpenSearch + notify
store + deliver
Alerts travel the same way blocks do: a Redis stream + consumer group, consumed
at-least-once with an XACK after success — exactly Lesson 7's
backbone, reused for the alert path:
Blocks (L7) and alerts (here) both flow through Redis streams + consumer groups + at-least-once + ACK. The
team didn't invent a second delivery mechanism — they reused the one they trust. Recognising a
codebase's small vocabulary of patterns is how you read a big system quickly: once you know the streaming idiom,
you already understand the alert transport.
Worker-pool fan-out (a clean Go concurrency pattern)
One readLoop goroutine pulls messages and feeds a channel; a pool of
worker goroutines drains it concurrently:
// readLoop → msgCh → N workers
msgCh := make(chan redis.XMessage)
go ap.readLoop(ctx, msgCh) // producer: XREADGROUP → channelfor i := 0; i < numWorkers; i++ {
go ap.worker(ctx, i, msgCh) // consumers: process concurrently
}
This is the canonical Go "fan-out": one reader, many workers sharing a channel. It parallelises delivery
(notifications can be slow) while the stream's consumer group guarantees each message is handled once.
2 · processMessage — four steps
Each worker runs the same four steps (processor.goprocessMessage):
Parse the AlertEvent from the stream message.
Persist to OpenSearch — using the stream message ID as the document ID (this is the dedup trick, below). Mark Delivered = false first.
Notify — notifyClient.Send(ctx, event) delivers it (webhook/channel) and returns whether delivery succeeded.
Update + ACK — record Delivered = true/false, then XACK the stream message.
3 · Exactly-once delivery from at-least-once transport ⭐
Here's the elegant part. The stream is at-least-once — a crash before XACK means the
alert is redelivered (L7). Naïvely that double-notifies the customer. The fix is the same idempotency move you
saw in Lesson 4, now applied to alerts:
Dedup by stable document ID
The OpenSearch doc ID is the Redis stream message ID. So a redelivered alert writes to the
same doc — an idempotent upsert, not a duplicate row. The customer's alert record is created once
regardless of how many times the message is delivered. At-least-once transport + a stable idempotency key
= effectively-once delivery. (Same recipe as the write path's MERGE + the graphwrite
IdemKey from L9.)
The Delivered flag closes the reliability gap
Persist happens before notify, with Delivered=false. If the notification send fails, the alert
is still recorded (just marked undelivered) — never lost. The status field makes delivery
observable and retryable: you can query OpenSearch for Delivered=false and re-send. Persist-then-deliver
is the same "durable state first" discipline as the cursor commit (L4) and the firing state (L12).
4 · A masterclass in fail-loud ops ⭐
This subsystem contains one of the best defensive-ops decisions in the repo, and it's worth studying
(cmd/alert-processor/main.goensureAlertIndexOrFatal). On startup it ensures the
OpenSearch index mapping — and if that fails, it deliberately FATAL-crashes the pod rather than
warning and continuing. Why?
The silent-failure trap it avoids (FORTA-2744)
If the view/scope fields aren't mapped as keyword, OpenSearch dynamically maps
them as analyzed text on first write. Then every term filter on view/scope
silently returns zero hits — forever, with no error surfaced to the API caller. Customers' alert filters
would just quietly show nothing. So the code chooses to crash loudly (a CrashLoopBackOff
pages a human) rather than run while silently dropping all filtering.
🔗 The third sighting of one principle
This is the same instinct as L8's watchdog pause and L9's DLQ + halt:
stop loudly rather than proceed silently wrong. A crash that pages you is a feature when the
alternative is invisible data loss. Across the whole system, the rule is consistent — fail loud, never corrupt
silently. Spot this principle and you can predict how the team will handle any new failure mode.
5 · The product loop, finally closed 🔁
You can now trace a single fact from chain to human with no gaps:
on-chain event (L1)
→ decoded (L3)
→ filtered + written (L4) ← monitored set, atomic commit
→ enriched + discovered (L5)
→ risk fields computed (L6·L13) ← max / noisy-OR / HHI
→ rule condition met (L12) ← firing state machine emits AlertEvent
→ AlertStream (L7 idiom)
→ persisted (dedup) +
delivered to a human (L15) ← exactly-once, fail-loud
That is the entire system, end to end. Every box you can now open.
Check yourself
1. How does an AlertEvent get from the rule-engine to the alert-processor?
2. The stream is at-least-once, so an alert can be redelivered. How is the customer alerted only once?
3. Why persist the alert (Delivered=false) before attempting the notification?
4. Why does the alert-processor FATAL-crash on a bad OpenSearch index mapping instead of warning?
5. The "fail loud, don't proceed silently wrong" decision here is the same principle as…
6. The readLoop → msgCh → worker pool is which Go pattern?
7. The dedup-by-message-ID trick is the same idea as which earlier mechanisms?
8. What does the Delivered status field enable operationally?
↳ Ask your teacher
Try: "Show me notification.Client.Send — what channels exist?" ·
"How are undelivered alerts retried?" ·
"What fields are on an AlertEvent?" ·
"Walk me through the OpenSearch index mapping for alerts."
What you can now do
Trace an alert from the rule-engine through the Redis AlertStream to delivery — and recognise it as L7's idiom reused.
Explain how at-least-once transport + dedup-by-message-ID yields effectively-once alerting.
Explain persist-then-deliver and the Delivered flag for no-loss, retryable delivery.
Articulate the fail-loud-on-misconfig decision and connect it to L8/L9's "stop, don't corrupt."
Read the worker-pool fan-out Go pattern.
🏁 The whole product loop, closed
From an RPC block to a customer's pager, there is no longer a single box you can't open. You've covered all three
core binaries, the streaming backbone, failure/recovery, the single-writer topology, bootstrap, observability,
the rule engine, the risk math, the read surface, and now alert delivery. You understand this system the way its authors do.