L29 said durable findings (streak ≥ threshold) "become Linear tickets." That sentence hides the whole problem:
the promoter runs every cycle, and a finding stays promotable for many cycles. Naively, that's a ticket per
cycle — the on-call's nightmare. So the promoter's real job isn't creating issues; it's creating each one exactly
once, across retries, partial failures, and two systems that can't be transacted together.
Your anchor: the alert that paged you 50 times
Every engineer has been paged repeatedly for one unchanging problem, or seen an actuator double-file because a write
half-succeeded. This is a pure backend problem — idempotent side-effects against an external API — wearing a
quality-harness hat. The graph barely appears; the lesson is exactly-once-ish actuation, which you'll meet in any system
that turns internal state into outside-world actions.
1 · The exactly-once primitive
A finding lives in OpenSearch with a streak (L29). The promoter's contract with that store is three methods —
and the whole design hinges on the last one:
type FindingStore interface {
RefreshIndex(ctx) error // read-your-writes (below)
FetchPromotable(ctx, class, threshold, limit) ([]Cand) // streak>=N AND LinearIssueID==""
SetLinearIssueID(ctx, findingID, linearIssueID) error // THE idempotency primitive
}
FetchPromotable only returns findings that are persistent (streak ≥ threshold) and not yet linked
(LinearIssueID empty). The moment a finding gets its Linear ID stamped, it drops out of every future fetch. So
"exactly once" reduces to: create the issue, then write the ID back. A linked finding is invisible forever after.
2 · Create-then-mark, and the window it owns
Here's the crux. The promoter writes to two systems — Linear (create issue) and OpenSearch (stamp ID) — and you
cannot make those two writes atomic (no distributed transaction). So you must pick an order, and each order has a
failure window. The code picks create-then-mark, deliberately:
1
CreateIssue on Linear. On API error → log, count, do not mark → the finding stays promotable and retries next run. No false link, safe.
2
SetLinearIssueID write-back, before counting success. On success → the finding is linked and never re-promoted. Done, exactly once.
⚠
The owned window: if step 1 succeeds but step 2 fails, the issue exists but is unlinked — next run could file a duplicate. This is the ONLY path that can dupe.
Why create-then-mark, not mark-then-create
Both orders have a failure window; the design picks the one whose worst case is recoverable. Create-then-mark's bad
case is a visible duplicate (two issues) — annoying, but caught by a dedicated LinearPromotionOrphaned
counter and the finding ID embedded in the issue body, so an operator can find and merge it. Mark-then-create's
bad case would be a silently dropped finding (marked done, but no issue ever created) — a real problem hidden
forever. Prefer a loud duplicate over a silent miss.
3 · Three layers of dedup
Belt and braces around that one window, the promoter dedups at three levels:
Guard
Catches
a
FetchPromotable excludes already-linked findings (re-checked in the loop)
the normal case — a filed finding never returns
b
a within-run seen set keyed on finding ID
the same finding appearing twice in one fetch page
c
on a Linear API error, never mark — retry naturally next run
a half-failed create becoming a false "done"
4 · Two throttles you've seen before
Kill switches, default-off. The promoter is disabled unless all of StreakThreshold > 0, Client != nil (a LINEAR_API_KEY is set), and Store != nil hold. An actuator that creates external side-effects ships off and is explicitly switched on — the same shadow-first caution as the healers (L30/L31). "Disabled" is counted on a metric, so a dashboard confirms the kill switch is engaged rather than guessing.
Per-run cap (MaxPerRun, default 5). Never file more than N issues per class per run; overflow is logged with a count and deferred to next run (the findings stay promotable). After a reseed, hundreds of findings can cross the threshold at once — the cap drips them out instead of flooding Linear and tripping its rate limits. It's L31's write budget, applied to an external API instead of the graph stream.
5 · Two smaller subtleties worth keeping
Read-your-writes, and an honest concurrency caveat
Read-your-writes: the streak upserts are written with WithRefresh(false) (cheap, ~1s async), so a fetch
issued immediately after would read the previous cycle's segments and fire the threshold a cycle late. So
PromoteClass calls RefreshIndex once per class before fetching — best-effort (a refresh miss just means
slightly-stale reads, not a correctness break). Concurrency: the fetch→create→write-back sequence is not
lease-guarded. It's safe only because quality-gate runs serially in one pod per chain — the docstring says so plainly,
and notes that a multi-replica deploy would need optimistic concurrency (an OpenSearch if_seq_no guard or a Redis
lease). Documenting the unhandled case is the engineering — single-writer is the stated deployment invariant (L9).
The shape of a safe actuator
Look at what makes this robust: an idempotency key (LinearIssueID), an ordering whose failure mode is recoverable,
layered dedup, a default-off kill switch, a per-run cap, and a frank note about the concurrency it does not handle.
None of it is about Linear or the graph specifically — it's the universal checklist for "turn internal state into an
external action without making a mess." That's the transferable lesson.
Check yourself
1. The promoter runs every cycle and a finding stays promotable for many cycles. What's its core challenge?
2. What makes SetLinearIssueID "the idempotency primitive"?
3. The promoter creates the Linear issue, then writes the ID back. Why that order rather than the reverse?
4. CreateIssue succeeds but the SetLinearIssueID write-back fails. What's the consequence, and how is it handled?
5. On a Linear API error during CreateIssue, the promoter deliberately does NOT mark the finding. Why?
6. The promoter is disabled unless StreakThreshold > 0, Client != nil, and Store != nil. What design instinct is that?
7. What does the per-run cap (MaxPerRun, default 5) protect against, and how does it relate to L31?
8. The docstring states the fetch→create→write-back sequence is unguarded by a lease, safe only on a single replica. Why document that?
↳ Ask your teacher
Try: "Show me how the streak Upsert + reset works in findings.go." ·
"What does the issue body's finding ID = sha256(class|ref_id|kind) let an operator do?" ·
"How would an if_seq_no guard make this multi-replica-safe?" ·
"Where in quality-gate main.go is PromoteClass called?" ·
"How does this compare to the healer as an actuator on the same findings?"
What you can now do
Explain the exactly-once goal and why SetLinearIssueID + the linked-exclusion in FetchPromotable achieve it.
Explain create-then-mark, the failure window it owns, and why a visible duplicate beats a silent drop.
List the three dedup layers (fetch-exclusion, within-run seen set, no-mark-on-error).
Describe the default-off kill switches and the per-run cap as the actuator's external-side-effect throttles.
Explain the read-your-writes refresh and the honestly-documented single-replica concurrency invariant.
Both branches of the finding fork, now seen
L29's finding either gets auto-healed (L30/L31) or auto-ticketed (here). Both are actuators on the same streak-tracked
findings, and both share a posture: idempotent, throttled, default-cautious, honest about their limits. The quality
subsystem measures drift and then does something about it — safely, on both branches.