From an empty database to a seeded risk graph — by asking the chain, not loading a snapshot. ~12 min.
Every lesson so far assumed the graph already existed. But on day zero, Memgraph is empty. Where do the first hundreds of thousands of nodes come from? The modern answer is striking: the system doesn't load a snapshot — it self-seeds by interrogating the chain, walking factory contracts and scanning event logs to discover the universe of entities around your focus tokens. Then it hands off to the realtime pipeline you already know.
risk-graph Python batch pipeline or its
full_graph.json — both are frozen, treat as nonexistent." The old world rebuilt the graph in nightly
Python batches and snapshotted it. The new world seeds once from the chain, then maintains incrementally
forever (L1–9). No snapshot to drift; the chain is the source.
Bootstrap doesn't try to index all of Ethereum — that's the monitored-set lesson (L1). It starts from a small set of focus tokens the operator cares about, supplied as a CSV (with USD prices resolved via DefiLlama):
// internal/freshstart/tokens.go — ParseFocusTokens // operator CSV → FocusTokens (address [+ optional price]) type PriceLookup interface { BatchPrice(ctx, chainID, addrs) (map[string]float64, error) }
Everything else is seeded around those tokens: what pools contain them, what vaults hold them, what markets accept them as collateral, who admins those contracts. The focus tokens are the seed crystal; the graph grows outward from them.
Source: internal/freshstart/tokens.go. Focus tokens also drive risk (L6) and promotion into the monitored set (L1) — one concept, reused everywhere.
How do you enumerate every Uniswap V3 pool that holds a focus token? You do exactly what you'd do by hand: walk the factory contract. The seeders ask the chain directly:
// pkg/enrichment/v3_pool_seeder.go pools, err := s.scanFactory(ctx, f, pricedSet) // enumerate a factory's pools // …writes pool nodes: p.factory = $protocol, p.source = 'v3_pool_seeder'
| Seeder | What it walks / scans |
|---|---|
v2/v3/v4_pool_seeder | AMM factory contracts → all pools (Uniswap, Sushi, …) |
lending_seeder | Lending protocols → markets (Aave reserves, Compound cTokens, Morpho markets) |
oft_seeder | LayerZero OFT deployments → cross-chain token adapters |
deployer_seeder | Contract deployers (creator addresses) |
lp_seeder / holder seeders | LP token holders, focus-token holders |
cast to list pools — just industrialised. Bootstrap and
discovery are the same idea at two speeds.
Dozens of seeders, some depending on others (you need tokens before holders). They're run as a DAG of tasks (pkg/bootstrap/task.go):
Three properties make it production-grade, each echoing a theme from this course:
| Property | How | Echoes |
|---|---|---|
| Concurrent but ordered | Independent seeders run in parallel; each declares its dependencies. | DAG scheduling |
| Resumable | Each task is gated on a per-task Redis state machine — a restart resumes from where it stopped, doesn't restart. | idempotency / determinism (L4·L8) |
| Retries with backoff | Transient errors retried with exponential backoff. | resilience (L5·L7) |
SET NX per
(chain, graph) with auto-refreshing TTL) ensures one pod owns the bootstrap run; other replicas skip it.
This is the same singleton-election pattern as L9's graph-writer lease — "exactly one of these runs at a
time" appears again. And the seeders publish their writes through the graph-writer (L9), so even
parallel seeders never hit the OCC abort path.
runBootstrap (cmd/indexer/main.go) — itself a one-shot, direct-ExecuteWrite
writer (the L9 exception, because it runs alone) — picks among:
| Path | What it does | When |
|---|---|---|
| Fresh-start (self-seed) | Walk factories + scan logs to build the graph from the chain. | The modern production path. |
| Clone | CloneFromSeedGraph copies all nodes/edges from an existing graph_id into a new one. | Spinning up a shadow / parity graph (e.g. test_carlos from risk-graph-rt — recall L2's partitions). |
| Genesis file | LoadFile(GENESIS_FILE) → load a JSON snapshot. | Frozen / legacy — the full_graph.json path, treated as nonexistent. A deprecated fallback. |
Whichever path runs, bootstrap then: populates the monitored set + focus tokens in
Redis (L1), sets the initial block cursor (START_BLOCK > metadata.snapshot_block
> RPC latest), and writes a GraphMetadata node (node/edge counts, timestamps).
Seeding is a bulk write. Running it while a graph-writer is live would race the singleton writer — exactly the
FORTA-2926 caveat from Lesson 9. So fresh-start has a
preflight check that refuses to run if the writer lease is held:
// internal/freshstart/preflight.go // Apply must NOT run if a graph-writer is alive: // the lease:graph-writer:{chain} key MUST be absent.
BlockCursor and
reintroduce the OCC collisions L9 was built to kill. Preflight encodes L9's invariant as an operator guardrail.
Bootstrap's job ends the moment the cursor is set. From there, everything you've already learned takes
over: the indexer consumes blocks from START_BLOCK forward (L1–4), enrichment classifies and
discovers (L5), the risk engine computes (L6) — all maintaining the graph incrementally, forever. There is
no second batch run. Bootstrap creates the world; the realtime pipeline keeps it alive.
lease:graph-writer:{chain} is held?Grounded in: CLAUDE.md (self-seeding, frozen Python pipeline), internal/freshstart/{plan,preflight,tokens}.go,
pkg/bootstrap/{task,runner,lock}.go (resumable DAG + cross-pod lock), pkg/enrichment/v{2,3,4}_pool_seeder.go (factory walks),
cmd/indexer/main.go::runBootstrap + pkg/genesis/loader.go (clone / genesis paths, cursor, GraphMetadata). Verify against source — the code is the truth.