Lesson 18 · The Delivery Boundary · Final Deep-Cut

Where the system ends

The last detail of the alert path — and a lesson about boundaries. ~9 min.

Builds on: L15 New: system boundary / adapter New: ctx-cancellable backoff

The alert-processor (L15) calls notifyClient.Send(event) to deliver an alert. Open pkg/notification expecting Slack/email/PagerDuty routing… and you find one small file. That's the lesson. The actual channels live in a separate service; this package is just a resilient HTTP client to it. The final thing to understand about risk-graph-indexer is where it stops.

1 · The boundary

risk-graph-indexer (this system)

decides what to alert: builds + maintains the graph, computes risk, evaluates rules, fires an AlertEvent.

POST →

forta-attester (external service)

decides how to deliver: Slack, email, webhooks, on-call routing, templating.

pkg/notification is the seam. Its entire job is to hand a correctly-shaped message across that wall over HTTP. The code even says so: NotificationRequest "matches the ChainWorkerMessage schema from forta-attester … fields and JSON tags must match exactly for the notification service to accept it."

The mature idea: know where your system ends

A big system isn't responsible for everything. risk-graph-indexer's bounded context is risk detection; notification delivery is someone else's bounded context. Connecting them with a narrow HTTP contract (instead of cramming Slack SDKs into the risk engine) keeps each side independently changeable. Recognising and respecting boundaries is a senior-level instinct — and it's the right place to end a tour of the system.

2 · It's an adapter (anti-corruption layer)

Send doesn't just forward the event — it translates the system's internal AlertEvent into the external service's schema (client.go):

Internal (our model)	→ External (their schema)
`event.Severity` ("high"/"medium"/"low")	`Risk` uint8 (100 / 50 / 25)
`NodeID, AlertType, Timestamp`	`FFRId` = `sha256(NodeID:AlertType:Timestamp)`
`event.Details`, `Label`, `AlertType`	`Failure.Metadata` (from/addresses/label/detection_module…)

This translation layer is an anti-corruption layer: the external service's wire format never leaks into the risk engine, and a change on either side is absorbed here. The two domains stay decoupled.

🔗 The idempotency key, one final time

Look at FFRId = sha256(NodeID:AlertType:Timestamp) — a stable, content-derived ID for the alert. It lets the external service dedup too. This is the same discipline you've now seen four times: the write path's MERGE (L4), graphwrite's IdemKey (L9), the alert-processor's doc-id = msg-id (L15), and now FFRId across the system boundary. "Make replays safe with a stable key" is woven through the entire codebase — even at its edge.

3 · Resilient by construction

The send is small but production-hardened — a compact tour of the patterns you've met:

// pkg/notification/client.go — Send (the retry loop)
for attempt := range c.maxRetries {
    if ctx.Err() != nil { return false }       // honour cancellation/shutdown
    if err := c.doPost(ctx, body); err == nil {
        return true                          // delivered → L15 marks Delivered=true
    }
    if attempt < c.maxRetries-1 {
        backoff := time.Duration(1<<uint(attempt)) * time.Second  // 1s, 2s, 4s, 8s…
        select {
        case <-time.After(backoff):                // wait…
        case <-ctx.Done(): return false            // …unless cancelled mid-wait
        }
    }
}
return false  // all retries exhausted → L15 marks Delivered=false (recoverable)

Detail	What it gives you (and where you saw it)
Exponential backoff `1<<attempt` s	Don't hammer a struggling service — 1s, 2s, 4s… (the resilience instinct of L5/L7).
Context-cancellable `select{…ctx.Done()}`	A shutdown signal aborts a backoff wait instantly — graceful shutdown done right (the proper Go `ctx` idiom).
Returns `bool`	Feeds L15's `Delivered` flag; `false` persists as undelivered → queryable + re-sendable. No alert lost.
Instrumented client + 10s timeout + bearer auth	`telemetry.InstrumentedHTTPClient` — even the outbound call is traced/metered (L11) and authenticated.

Your anchor

You've wired a webhook-out before. This is that — done with the rigor a financial alerting system needs: adapt the payload, authenticate, retry with backoff, stay cancellable, report success truthfully, and instrument it. The "boring" edge of a system is where reliability is actually won or lost.

Check yourself

1. What does pkg/notification actually contain?

2. The division of responsibility at this boundary is…

3. Send translating AlertEvent → NotificationRequest (severity→risk number, etc.) is an example of…

4. FFRId = sha256(NodeID:AlertType:Timestamp) is which now-familiar pattern?

5. What does the select { case <-time.After(backoff): case <-ctx.Done(): } achieve?

6. Send returning false (all retries exhausted) matters because…

7. The backoff schedule 1<<attempt seconds produces…

8. The big-picture takeaway of this final lesson is…

↳ Ask your teacher

Try: "What is forta-attester / the ChainWorkerMessage schema?" · "How are undelivered (Delivered=false) alerts re-sent?" · "Show me InstrumentedHTTPClient," · "Where does ControllerID come from on an AlertEvent?"

What you can now do

Explain that delivery channels live in an external service; pkg/notification is just the resilient HTTP seam.
Describe the boundary: this system decides what to alert, forta-attester decides how to deliver.
Recognise Send as an adapter / anti-corruption layer, and FFRId as a cross-boundary idempotency key.
Read the exponential-backoff + context-cancellable retry loop and connect Send's bool to L15's Delivered flag.
Articulate why knowing where a system ends is a feature, not a gap.

🎓 The deep-understanding tour is complete — 18 lessons

You set out to understand risk-graph-indexer deeply, end to end, and you have. Data plane (L1–L11): pipeline, data model, decode, write, enrich, risk, streaming, failure/recovery, single-writer, bootstrap, observability. Analytics (L6·L13·L16): cells, field math, exposure. Consumer & product side (L12·L14·L15): rules, read surface, alert delivery. Human control plane (L17). And now the system boundary (L18). You can trace any fact from an RPC block to a customer's pager, argue why the system stays correct under failure, recite the formulas, and — the real sign of mastery — recognise every "new" corner as familiar patterns recombined. That is deep understanding.

← PreviousLesson 17 · The Admin Panel · Deeper Track Next →Lesson 19 · at_risk Aggregation · Deeper Track

Grounded in: pkg/notification/client.go (NotificationRequest matches forta-attester ChainWorkerMessage; Send adapter: severity→risk, FFRId=sha256, metadata mapping; exponential-backoff + ctx-cancellable retry; InstrumentedHTTPClient + bearer auth + 10s timeout; bool → L15 Delivered). Verify against source — the code is the truth.