The disaster that forced it, and the machinery behind L4's "recorder" and L8's two cursors. ~13 min.
Back in Lesson 4 I told a small lie of omission: "the indexer commits a batch in one transaction." The real architecture is subtler, and it exists because of a production disaster. This lesson is the true machinery — and it retroactively explains two things you've already met: the mysterious "recorder" (L4) and the two cursors (L8).
A 2026-05-16 stage bootstrap was meant to seed hundreds of thousands of pools and tokens into Memgraph. Validation found ~93% of pools and ~82% of tokens simply missing. The v4 pool seeder?
1 / 600,000
pools written. The rest silently vanished.
Cannot resolve conflicting transactions. The many
enrichment seeders (v4_seeder, v2/v3_pool_seeder, lending_seeder, …) all run
in parallel, all calling graphstore.ExecuteWrite directly, all hammering the same graph
partition. Under sustained contention every seeder loses every retry (the driver retries 3× then gives up),
and the rows are silently dropped.OCC aborts require concurrent transactions. So remove the concurrency: route every bulk write through
a single writer. Producers stop calling the database directly. Instead they publish a
graphwrite.Request to a Redis stream graph-writes:{chainID}, and one leased
graph-writer pod per chain applies them serially.
The "one pod" guarantee is a lease — a singleton election. Exactly one graph-writer per chain
holds lease:graph-writer:{chainID} (Redis, with a TTL); a second pod blocks at
AcquireLease until the first releases or its lease expires. No two writers ever run at once, so OCC
conflicts are structurally impossible.
Source: docs/single-writer.md (motivation + topology), pkg/graphwrite/ (Client, Consumer), pkg/statetracker (AcquireLease). The full disaster write-up is in that doc — read it; it's one of the best things in the repo.
The wire format (pkg/graphwrite/envelope.go) is deliberately simple:
type Request struct { Version uint8 // protocol version; consumer rejects newer Source string // producer slug: "price_refresher", "indexer", … ChainID uint64 GraphID string IdemKey string // "{source}:{unique id}" — for idempotent replay BlockNum uint64 Cypher *CypherOp // exactly one of Cypher… Delta *types.GraphDelta // …or Delta — never both, never neither }
A CypherOp can carry many statements, and the consumer applies all of them in a single
ExecuteWrite transaction. That's how multi-statement atomicity is preserved even though the
producer no longer holds the transaction.
rec.flush() / rec.SetCursor() on a "recorder" instead of a live tx? That's
pkg/indexer/tx_recorder.go — a graphstore.Tx that buffers writes into
[]CypherStatement instead of executing them, then ships them as one Request. The consumer
re-assembles them into one transaction. Block determinism survives — the atomic "mutations + cursor" commit
just moved from an indexer-local tx to a consumer-local tx. L4's simplification was honest; this is the full truth.| Invariant | What it guarantees |
|---|---|
| Exactly-one-of | Validate() rejects a Request with both Cypher and Delta, or neither. |
| Idempotent IdemKey | A duplicate Publish (retry, stream-trim replay, recovery) applies idempotently — the Cypher uses MERGE (L4 again). |
| Single-writer lease | Only one graph-writer per chain ever writes. The structural guarantee against OCC. |
| Poison-entry policy | Malformed / wrong-chain Requests are ACKed & dropped (logged). A well-formed Request that fails to apply stays unacked and is retried (drainPending). |
Now Lesson 8's two-cursor model isn't arbitrary — it's a direct consequence of this topology:
| Cursor | Advances when… |
|---|---|
Redis BlockCursor ("queued") | the indexer publishes the Request — the block is queued for apply. |
Memgraph BlockCursor ("applied") | the graph-writer's Consumer.apply actually commits it. |
Between publish and consume there's an async window where Memgraph is at block N but the Redis balance cache is still at N−1. That's exactly the drift L8's reconciler heals on restart — the window just widened from "tx-commit duration" to "tx-commit + publish-to-consume latency." Same recovery path, longer fuse.
A single FIFO writer does ~58 applies/sec. But enrichment is ~98% of write volume. So the rare,
critical source="indexer" cursor-advancing envelopes got stuck behind an ~80k backlog of
enrichment writes. The consequence chain is a greatest-hits of this course:
BlockCursor stalls (L8) → the indexer's cursor-watchdog perma-pauses
(L8) → realtime lag grows. One overloaded FIFO lane stalled the whole pipeline.The fix — lanes (GRAPHWRITE_LANE_SPLIT_ENABLED, FORTA-2920): split the stream into a
priority state lane (cursor-advancing) and a bulk meta lane (the firehose) — but keep
still one pod, one lease, one in-flight Memgraph writer (you must not reintroduce concurrency!). The writer
drains the state lane preferentially so the cursor never starves behind enrichment. Each lane carries its own
DLQKey and HaltKey (pkg/graphwrite/lane_params.go).
What if a Request keeps failing to apply? The writer must not just skip it — skipping would violate
contiguity (blocks applied in order, no gaps) and break the determinism invariant. Instead: a
repeatedly-failing entry goes to a dead-letter queue (DLQMaxLen=10_000), and the writer can
halt the lane (the HaltKey). A halted writer stops advancing Memgraph's cursor — which (full
circle) is exactly what trips L8's watchdog to pause the indexer. The system would rather stop than apply
out of order.
Not everything goes through the writer. Sequential, one-shot writers never collide, so they keep
direct ExecuteWrite access:
pkg/genesis/loader.go — one-shot bootstrap, runs alone before any other writer.cmd/holds-backfill, …) — main() + os.Exit, no loop.The principle: OCC needs concurrency, so only concurrent fan-out producers must be serialized.
forbidigo lint rule in .golangci.yml rejects any new graphstore.ExecuteWrite
caller outside the allowlist, and a probe (scripts/lint-graphwrite-guard-test.sh, wired into
make lint) asserts the rule actually fires. The architecture isn't a convention you might forget — it's
mechanically guarded. To add an exception you must edit the allowlist and document the rationale in the doc.pkg/genesis/loader.go on direct ExecuteWrite?Requests → one leased graph-writer applies serially.Grounded in: docs/single-writer.md (OCC root cause, topology, protocol, invariants, lane split FORTA-2920, ops),
pkg/graphwrite/envelope.go (Request/CypherOp), pkg/graphwrite/lane_params.go (Lane/DLQKey/HaltKey),
pkg/graphwrite/dlq.go (DLQMaxLen), pkg/indexer/tx_recorder.go (recorder), .golangci.yml (forbidigo guard). Verify against source — the code is the truth.