Lesson 10 · Bootstrap & Fresh-Start

Where the graph comes from

From an empty database to a seeded risk graph — by asking the chain, not loading a snapshot. ~12 min.

Builds on: L5 · L9 Anchor: factory contracts New: self-seeding New: resumable task DAG

Every lesson so far assumed the graph already existed. But on day zero, Memgraph is empty. Where do the first hundreds of thousands of nodes come from? The modern answer is striking: the system doesn't load a snapshot — it self-seeds by interrogating the chain, walking factory contracts and scanning event logs to discover the universe of entities around your focus tokens. Then it hands off to the realtime pipeline you already know.

The big architectural choice

The repo's CLAUDE.md is blunt: the system is "self-seeding (factory walks, event-log scans, protocol seeders); does NOT depend on the legacy risk-graph Python batch pipeline or its full_graph.json — both are frozen, treat as nonexistent." The old world rebuilt the graph in nightly Python batches and snapshotted it. The new world seeds once from the chain, then maintains incrementally forever (L1–9). No snapshot to drift; the chain is the source.

1 · Focus tokens are the seed crystal

Bootstrap doesn't try to index all of Ethereum — that's the monitored-set lesson (L1). It starts from a small set of focus tokens the operator cares about, supplied as a CSV (with USD prices resolved via DefiLlama):

// internal/freshstart/tokens.go — ParseFocusTokens
// operator CSV → FocusTokens (address [+ optional price])
type PriceLookup interface { BatchPrice(ctx, chainID, addrs) (map[string]float64, error) }

Everything else is seeded around those tokens: what pools contain them, what vaults hold them, what markets accept them as collateral, who admins those contracts. The focus tokens are the seed crystal; the graph grows outward from them.

Source: internal/freshstart/tokens.go. Focus tokens also drive risk (L6) and promotion into the monitored set (L1) — one concept, reused everywhere.

2 · Seeders: discovery at rest (your factory anchor)

How do you enumerate every Uniswap V3 pool that holds a focus token? You do exactly what you'd do by hand: walk the factory contract. The seeders ask the chain directly:

// pkg/enrichment/v3_pool_seeder.go
pools, err := s.scanFactory(ctx, f, pricedSet)   // enumerate a factory's pools
// …writes pool nodes:  p.factory = $protocol, p.source = 'v3_pool_seeder'

Seeder	What it walks / scans
`v2/v3/v4_pool_seeder`	AMM factory contracts → all pools (Uniswap, Sushi, …)
`lending_seeder`	Lending protocols → markets (Aave reserves, Compound cTokens, Morpho markets)
`oft_seeder`	LayerZero OFT deployments → cross-chain token adapters
`deployer_seeder`	Contract deployers (creator addresses)
`lp_seeder` / holder seeders	LP token holders, focus-token holders

This is "discovery" (L5) run eagerly

In Lesson 5, discovery grew the graph reactively — a related address found during enrichment gets monitored. Seeders do the same thing proactively, in bulk, at boot: instead of waiting to stumble on pools via events, they walk the factory and grab them all up front. Same factory-walk you'd run with cast to list pools — just industrialised. Bootstrap and discovery are the same idea at two speeds.

3 · The orchestration: a resumable, locked task DAG

Dozens of seeders, some depending on others (you need tokens before holders). They're run as a DAG of tasks (pkg/bootstrap/task.go):

focus tokens

seed

→

pool seeders

v2/v3/v4 (parallel)

→

holder / lp seeders

depend on pools

→

lending / oft / deployer

parallel

Three properties make it production-grade, each echoing a theme from this course:

Property	How	Echoes
Concurrent but ordered	Independent seeders run in parallel; each declares its dependencies.	DAG scheduling
Resumable	Each task is gated on a per-task Redis state machine — a restart resumes from where it stopped, doesn't restart.	idempotency / determinism (L4·L8)
Retries with backoff	Transient errors retried with exponential backoff.	resilience (L5·L7)

🔗 The lock you've seen before

A cross-pod lock (pkg/bootstrap/lock.go, a Redis SET NX per (chain, graph) with auto-refreshing TTL) ensures one pod owns the bootstrap run; other replicas skip it. This is the same singleton-election pattern as L9's graph-writer lease — "exactly one of these runs at a time" appears again. And the seeders publish their writes through the graph-writer (L9), so even parallel seeders never hit the OCC abort path.

4 · Three ways to populate a graph

runBootstrap (cmd/indexer/main.go) — itself a one-shot, direct-ExecuteWrite writer (the L9 exception, because it runs alone) — picks among:

Path	What it does	When
Fresh-start (self-seed)	Walk factories + scan logs to build the graph from the chain.	The modern production path.
Clone	`CloneFromSeedGraph` copies all nodes/edges from an existing `graph_id` into a new one.	Spinning up a shadow / parity graph (e.g. `test_carlos` from `risk-graph-rt` — recall L2's partitions).
Genesis file	`LoadFile(GENESIS_FILE)` → load a JSON snapshot.	Frozen / legacy — the `full_graph.json` path, treated as nonexistent. A deprecated fallback.

Whichever path runs, bootstrap then: populates the monitored set + focus tokens in Redis (L1), sets the initial block cursor (START_BLOCK > metadata.snapshot_block > RPC latest), and writes a GraphMetadata node (node/edge counts, timestamps).

5 · Preflight: don't seed into a live graph

Seeding is a bulk write. Running it while a graph-writer is live would race the singleton writer — exactly the FORTA-2926 caveat from Lesson 9. So fresh-start has a preflight check that refuses to run if the writer lease is held:

// internal/freshstart/preflight.go
// Apply must NOT run if a graph-writer is alive:
//   the lease:graph-writer:{chain} key MUST be absent.

A safety check you can now fully read

You understand why this check exists: a live graph-writer holds the lease for its whole pod lifetime and writes the cursor mid-apply; a concurrent bulk seed would race it on the singleton BlockCursor and reintroduce the OCC collisions L9 was built to kill. Preflight encodes L9's invariant as an operator guardrail.

6 · Then: hand off to realtime

Bootstrap's job ends the moment the cursor is set. From there, everything you've already learned takes over: the indexer consumes blocks from START_BLOCK forward (L1–4), enrichment classifies and discovers (L5), the risk engine computes (L6) — all maintaining the graph incrementally, forever. There is no second batch run. Bootstrap creates the world; the realtime pipeline keeps it alive.

The lifecycle, whole

seed once from the chain (factory walks, around focus tokens) → set the cursor → run forward forever (decode · filter · write · enrich · discover · score). The frozen Python batch pipeline did the first part nightly and forever; this system does it once and then never stops. That's the whole reason the realtime indexer exists.

Check yourself

1. On day zero, where do the graph's initial nodes come from?

2. What plays the role of "seed crystal" that bootstrap grows the graph around?

3. How does a pool seeder enumerate all of a protocol's pools?

4. A bootstrap pod restarts halfway through seeding. What happens?

5. The cross-pod bootstrap lock (SET NX per chain/graph) is the same pattern as…

6. Why does fresh-start's preflight refuse to run if lease:graph-writer:{chain} is held?

7. When would you use the clone path instead of fresh-start?

8. After bootstrap sets the cursor, what maintains the graph?

↳ Ask your teacher

Try: "Show me scanFactory in v3_pool_seeder.go," · "How does the per-task Redis state machine encode progress?" · "Walk me through CloneFromSeedGraph," · "What exactly is in the GraphMetadata node?"

What you can now do

Explain self-seeding vs the frozen snapshot model, and why the realtime system seeds once then maintains forever.
Describe how seeders walk factory contracts to discover entities around focus tokens (discovery, run eagerly).
Explain the resumable, locked task-DAG orchestration and how it echoes idempotency + singleton-lease themes.
Distinguish the three population paths (fresh-start / clone / genesis-file) and why preflight guards against a live writer.
Trace the hand-off from bootstrap to the realtime pipeline.

🌱 The system from birth to steady state

You can now narrate the entire life of the graph: born by self-seeding from the chain around focus tokens (L10) → maintained forever by ingest·decode·filter·write (L1–4·L7) → enriched & grown by discovery (L5) → scored by the risk engine (L6) → all serialized through one writer (L9) and kept correct through every failure (L8). That's the whole system, cradle to steady state.

← PreviousLesson 09 · The Single-Writer Architecture Next →Lesson 11 · Observability · Phase-1 Capstone

Grounded in: CLAUDE.md (self-seeding, frozen Python pipeline), internal/freshstart/{plan,preflight,tokens}.go, pkg/bootstrap/{task,runner,lock}.go (resumable DAG + cross-pod lock), pkg/enrichment/v{2,3,4}_pool_seeder.go (factory walks), cmd/indexer/main.go::runBootstrap + pkg/genesis/loader.go (clone / genesis paths, cursor, GraphMetadata). Verify against source — the code is the truth.