Lesson 18 · The Delivery Boundary · Final Deep-Cut

Where the system ends

The last detail of the alert path — and a lesson about boundaries. ~9 min.

Builds on: L15 New: system boundary / adapter New: ctx-cancellable backoff

The alert-processor (L15) calls notifyClient.Send(event) to deliver an alert. Open pkg/notification expecting Slack/email/PagerDuty routing… and you find one small file. That's the lesson. The actual channels live in a separate service; this package is just a resilient HTTP client to it. The final thing to understand about risk-graph-indexer is where it stops.

1 · The boundary

risk-graph-indexer (this system)
decides what to alert: builds + maintains the graph, computes risk, evaluates rules, fires an AlertEvent.
POST →
forta-attester (external service)
decides how to deliver: Slack, email, webhooks, on-call routing, templating.

pkg/notification is the seam. Its entire job is to hand a correctly-shaped message across that wall over HTTP. The code even says so: NotificationRequest "matches the ChainWorkerMessage schema from forta-attester … fields and JSON tags must match exactly for the notification service to accept it."

The mature idea: know where your system ends
A big system isn't responsible for everything. risk-graph-indexer's bounded context is risk detection; notification delivery is someone else's bounded context. Connecting them with a narrow HTTP contract (instead of cramming Slack SDKs into the risk engine) keeps each side independently changeable. Recognising and respecting boundaries is a senior-level instinct — and it's the right place to end a tour of the system.

2 · It's an adapter (anti-corruption layer)

Send doesn't just forward the event — it translates the system's internal AlertEvent into the external service's schema (client.go):

Internal (our model)→ External (their schema)
event.Severity ("high"/"medium"/"low")Risk uint8 (100 / 50 / 25)
NodeID, AlertType, TimestampFFRId = sha256(NodeID:AlertType:Timestamp)
event.Details, Label, AlertTypeFailure.Metadata (from/addresses/label/detection_module…)

This translation layer is an anti-corruption layer: the external service's wire format never leaks into the risk engine, and a change on either side is absorbed here. The two domains stay decoupled.

🔗 The idempotency key, one final time
Look at FFRId = sha256(NodeID:AlertType:Timestamp) — a stable, content-derived ID for the alert. It lets the external service dedup too. This is the same discipline you've now seen four times: the write path's MERGE (L4), graphwrite's IdemKey (L9), the alert-processor's doc-id = msg-id (L15), and now FFRId across the system boundary. "Make replays safe with a stable key" is woven through the entire codebase — even at its edge.

3 · Resilient by construction

The send is small but production-hardened — a compact tour of the patterns you've met:

// pkg/notification/client.go — Send (the retry loop)
for attempt := range c.maxRetries {
    if ctx.Err() != nil { return false }       // honour cancellation/shutdown
    if err := c.doPost(ctx, body); err == nil {
        return true                          // delivered → L15 marks Delivered=true
    }
    if attempt < c.maxRetries-1 {
        backoff := time.Duration(1<<uint(attempt)) * time.Second  // 1s, 2s, 4s, 8s…
        select {
        case <-time.After(backoff):                // wait…
        case <-ctx.Done(): return false            // …unless cancelled mid-wait
        }
    }
}
return false  // all retries exhausted → L15 marks Delivered=false (recoverable)
DetailWhat it gives you (and where you saw it)
Exponential backoff 1<<attempt sDon't hammer a struggling service — 1s, 2s, 4s… (the resilience instinct of L5/L7).
Context-cancellable select{…ctx.Done()}A shutdown signal aborts a backoff wait instantly — graceful shutdown done right (the proper Go ctx idiom).
Returns boolFeeds L15's Delivered flag; false persists as undelivered → queryable + re-sendable. No alert lost.
Instrumented client + 10s timeout + bearer authtelemetry.InstrumentedHTTPClient — even the outbound call is traced/metered (L11) and authenticated.
Your anchor
You've wired a webhook-out before. This is that — done with the rigor a financial alerting system needs: adapt the payload, authenticate, retry with backoff, stay cancellable, report success truthfully, and instrument it. The "boring" edge of a system is where reliability is actually won or lost.

Check yourself

1. What does pkg/notification actually contain?
2. The division of responsibility at this boundary is…
3. Send translating AlertEventNotificationRequest (severity→risk number, etc.) is an example of…
4. FFRId = sha256(NodeID:AlertType:Timestamp) is which now-familiar pattern?
5. What does the select { case <-time.After(backoff): case <-ctx.Done(): } achieve?
6. Send returning false (all retries exhausted) matters because…
7. The backoff schedule 1<<attempt seconds produces…
8. The big-picture takeaway of this final lesson is…
↳ Ask your teacher
Try: "What is forta-attester / the ChainWorkerMessage schema?" · "How are undelivered (Delivered=false) alerts re-sent?" · "Show me InstrumentedHTTPClient," · "Where does ControllerID come from on an AlertEvent?"

What you can now do

🎓 The deep-understanding tour is complete — 18 lessons
You set out to understand risk-graph-indexer deeply, end to end, and you have. Data plane (L1–L11): pipeline, data model, decode, write, enrich, risk, streaming, failure/recovery, single-writer, bootstrap, observability. Analytics (L6·L13·L16): cells, field math, exposure. Consumer & product side (L12·L14·L15): rules, read surface, alert delivery. Human control plane (L17). And now the system boundary (L18). You can trace any fact from an RPC block to a customer's pager, argue why the system stays correct under failure, recite the formulas, and — the real sign of mastery — recognise every "new" corner as familiar patterns recombined. That is deep understanding.

Grounded in: pkg/notification/client.go (NotificationRequest matches forta-attester ChainWorkerMessage; Send adapter: severity→risk, FFRId=sha256, metadata mapping; exponential-backoff + ctx-cancellable retry; InstrumentedHTTPClient + bearer auth + 10s timeout; bool → L15 Delivered). Verify against source — the code is the truth.