Lesson 10 · Bootstrap & Fresh-Start

Where the graph comes from

From an empty database to a seeded risk graph — by asking the chain, not loading a snapshot. ~12 min.

Builds on: L5 · L9 Anchor: factory contracts New: self-seeding New: resumable task DAG

Every lesson so far assumed the graph already existed. But on day zero, Memgraph is empty. Where do the first hundreds of thousands of nodes come from? The modern answer is striking: the system doesn't load a snapshot — it self-seeds by interrogating the chain, walking factory contracts and scanning event logs to discover the universe of entities around your focus tokens. Then it hands off to the realtime pipeline you already know.

The big architectural choice
The repo's CLAUDE.md is blunt: the system is "self-seeding (factory walks, event-log scans, protocol seeders); does NOT depend on the legacy risk-graph Python batch pipeline or its full_graph.json — both are frozen, treat as nonexistent." The old world rebuilt the graph in nightly Python batches and snapshotted it. The new world seeds once from the chain, then maintains incrementally forever (L1–9). No snapshot to drift; the chain is the source.

1 · Focus tokens are the seed crystal

Bootstrap doesn't try to index all of Ethereum — that's the monitored-set lesson (L1). It starts from a small set of focus tokens the operator cares about, supplied as a CSV (with USD prices resolved via DefiLlama):

// internal/freshstart/tokens.go — ParseFocusTokens
// operator CSV → FocusTokens (address [+ optional price])
type PriceLookup interface { BatchPrice(ctx, chainID, addrs) (map[string]float64, error) }

Everything else is seeded around those tokens: what pools contain them, what vaults hold them, what markets accept them as collateral, who admins those contracts. The focus tokens are the seed crystal; the graph grows outward from them.

Source: internal/freshstart/tokens.go. Focus tokens also drive risk (L6) and promotion into the monitored set (L1) — one concept, reused everywhere.

2 · Seeders: discovery at rest (your factory anchor)

How do you enumerate every Uniswap V3 pool that holds a focus token? You do exactly what you'd do by hand: walk the factory contract. The seeders ask the chain directly:

// pkg/enrichment/v3_pool_seeder.go
pools, err := s.scanFactory(ctx, f, pricedSet)   // enumerate a factory's pools
// …writes pool nodes:  p.factory = $protocol, p.source = 'v3_pool_seeder'
SeederWhat it walks / scans
v2/v3/v4_pool_seederAMM factory contracts → all pools (Uniswap, Sushi, …)
lending_seederLending protocols → markets (Aave reserves, Compound cTokens, Morpho markets)
oft_seederLayerZero OFT deployments → cross-chain token adapters
deployer_seederContract deployers (creator addresses)
lp_seeder / holder seedersLP token holders, focus-token holders
This is "discovery" (L5) run eagerly
In Lesson 5, discovery grew the graph reactively — a related address found during enrichment gets monitored. Seeders do the same thing proactively, in bulk, at boot: instead of waiting to stumble on pools via events, they walk the factory and grab them all up front. Same factory-walk you'd run with cast to list pools — just industrialised. Bootstrap and discovery are the same idea at two speeds.

3 · The orchestration: a resumable, locked task DAG

Dozens of seeders, some depending on others (you need tokens before holders). They're run as a DAG of tasks (pkg/bootstrap/task.go):

focus tokens
seed
pool seeders
v2/v3/v4 (parallel)
holder / lp seeders
depend on pools
lending / oft / deployer
parallel

Three properties make it production-grade, each echoing a theme from this course:

PropertyHowEchoes
Concurrent but orderedIndependent seeders run in parallel; each declares its dependencies.DAG scheduling
ResumableEach task is gated on a per-task Redis state machine — a restart resumes from where it stopped, doesn't restart.idempotency / determinism (L4·L8)
Retries with backoffTransient errors retried with exponential backoff.resilience (L5·L7)
🔗 The lock you've seen before
A cross-pod lock (pkg/bootstrap/lock.go, a Redis SET NX per (chain, graph) with auto-refreshing TTL) ensures one pod owns the bootstrap run; other replicas skip it. This is the same singleton-election pattern as L9's graph-writer lease — "exactly one of these runs at a time" appears again. And the seeders publish their writes through the graph-writer (L9), so even parallel seeders never hit the OCC abort path.

4 · Three ways to populate a graph

runBootstrap (cmd/indexer/main.go) — itself a one-shot, direct-ExecuteWrite writer (the L9 exception, because it runs alone) — picks among:

PathWhat it doesWhen
Fresh-start (self-seed)Walk factories + scan logs to build the graph from the chain.The modern production path.
CloneCloneFromSeedGraph copies all nodes/edges from an existing graph_id into a new one.Spinning up a shadow / parity graph (e.g. test_carlos from risk-graph-rt — recall L2's partitions).
Genesis fileLoadFile(GENESIS_FILE) → load a JSON snapshot.Frozen / legacy — the full_graph.json path, treated as nonexistent. A deprecated fallback.

Whichever path runs, bootstrap then: populates the monitored set + focus tokens in Redis (L1), sets the initial block cursor (START_BLOCK > metadata.snapshot_block > RPC latest), and writes a GraphMetadata node (node/edge counts, timestamps).

5 · Preflight: don't seed into a live graph

Seeding is a bulk write. Running it while a graph-writer is live would race the singleton writer — exactly the FORTA-2926 caveat from Lesson 9. So fresh-start has a preflight check that refuses to run if the writer lease is held:

// internal/freshstart/preflight.go
// Apply must NOT run if a graph-writer is alive:
//   the lease:graph-writer:{chain} key MUST be absent.
A safety check you can now fully read
You understand why this check exists: a live graph-writer holds the lease for its whole pod lifetime and writes the cursor mid-apply; a concurrent bulk seed would race it on the singleton BlockCursor and reintroduce the OCC collisions L9 was built to kill. Preflight encodes L9's invariant as an operator guardrail.

6 · Then: hand off to realtime

Bootstrap's job ends the moment the cursor is set. From there, everything you've already learned takes over: the indexer consumes blocks from START_BLOCK forward (L1–4), enrichment classifies and discovers (L5), the risk engine computes (L6) — all maintaining the graph incrementally, forever. There is no second batch run. Bootstrap creates the world; the realtime pipeline keeps it alive.

The lifecycle, whole
seed once from the chain (factory walks, around focus tokens) → set the cursorrun forward forever (decode · filter · write · enrich · discover · score). The frozen Python batch pipeline did the first part nightly and forever; this system does it once and then never stops. That's the whole reason the realtime indexer exists.

Check yourself

1. On day zero, where do the graph's initial nodes come from?
2. What plays the role of "seed crystal" that bootstrap grows the graph around?
3. How does a pool seeder enumerate all of a protocol's pools?
4. A bootstrap pod restarts halfway through seeding. What happens?
5. The cross-pod bootstrap lock (SET NX per chain/graph) is the same pattern as…
6. Why does fresh-start's preflight refuse to run if lease:graph-writer:{chain} is held?
7. When would you use the clone path instead of fresh-start?
8. After bootstrap sets the cursor, what maintains the graph?
↳ Ask your teacher
Try: "Show me scanFactory in v3_pool_seeder.go," · "How does the per-task Redis state machine encode progress?" · "Walk me through CloneFromSeedGraph," · "What exactly is in the GraphMetadata node?"

What you can now do

🌱 The system from birth to steady state
You can now narrate the entire life of the graph: born by self-seeding from the chain around focus tokens (L10) → maintained forever by ingest·decode·filter·write (L1–4·L7) → enriched & grown by discovery (L5) → scored by the risk engine (L6) → all serialized through one writer (L9) and kept correct through every failure (L8). That's the whole system, cradle to steady state.

Grounded in: CLAUDE.md (self-seeding, frozen Python pipeline), internal/freshstart/{plan,preflight,tokens}.go, pkg/bootstrap/{task,runner,lock}.go (resumable DAG + cross-pod lock), pkg/enrichment/v{2,3,4}_pool_seeder.go (factory walks), cmd/indexer/main.go::runBootstrap + pkg/genesis/loader.go (clone / genesis paths, cursor, GraphMetadata). Verify against source — the code is the truth.