Lesson 24 · Discovery / Enrichment Internals · New Subsystem

The discovery flywheel

How the graph grows itself — one address discovers the next, without a hardcoded map of DeFi. ~14 min.

Builds on: L2 · L5 · L10 Anchor: EIP-1967, Safe, ERC-4626 New: self-expansion loop New: seeders vs refreshers

DeFi has tens of thousands of contracts and no central registry. You cannot hardcode the map — it changes daily. So the indexer doesn't try. It starts from a few seeds and grows the graph by following the chain itself: every contract it enriches reveals the addresses it depends on, those get queued, and the loop repeats. That self-expanding loop is the discovery flywheel — the subsystem that turns "watch these 5 tokens" into "watch the entire dependency cone around them."

Your anchor: you already know how to "follow" a contract
Given a contract you'd instinctively poke it: is it a proxy? (read the EIP-1967 slot for the impl). Is it a Gnosis Safe? (getOwners()). Is it an ERC-4626 vault? (asset()). Each answer hands you more addresses to investigate — the impl, the signers, the underlying. Discovery is exactly that instinct, automated and run to a fixpoint.

1 · The loop

L2 gave you the node lifecycle (bare → enriched). L5 gave you the 15-stage worker. The piece those left implicit is that stage 1 and stage 15 form a cycle:

bare nodepending_enrichment=true
classify (RPC)proxy? safe? vault?
RelatedAddrsimpl, owners, underlying…
monitored setstage 15 propagation
⤺ each related address re-enters as a new bare node — the loop closes

The worker polls Memgraph for pending_enrichment=true nodes (a pull loop), runs the pipeline, and at the end (stage 15) adds every discovered RelatedAddrs to the monitored set. Those become new bare nodes, get polled, and the cycle continues until it reaches a fixpoint — no new addresses. The graph discovers its own boundary.

2 · Where related addresses come from (the RPC probes)

Stage 1 is pure on-chain interrogation — and every probe that succeeds appends to RelatedAddrs. Here's the proxy resolver, reading the three canonical EIP-1967 storage slots directly:

// enricher.go — resolveProxy (slots from rpc_calls.go)
implBytes, _ := telemetry.StorageAt(ctx, rpc, addr, implSlot, nil)   // 0x360894…382bbc
implAddr := common.BytesToAddress(implBytes)
if implAddr != zeroAddr {
    c.IsProxy = true
    c.Implementation = &implAddr
    c.RelatedAddrs = append(c.RelatedAddrs, implAddr)        // ← follow the impl
}
// then adminSlot (0xb531…6103) → ProxyAdmin, beaconSlot (0xa3f0…3d50) → ProxyBeacon, both appended too
ProbeEVM mechanismDiscovers
Proxy resolutioneth_getStorageAt on EIP-1967 impl/admin/beacon slotsimplementation, proxy admin, beacon
Wrapper detectionasset() (ERC-4626), underlying(), stETH() (wstETH)the underlying token
Multisig detectiongetOwners() on a Gnosis Safeeach signer key (→ L22's expansion!)
Curator/managercurator() / manager() on vaultsthe controlling entity (→ CURATES edge)
DeployerEtherscan/Blockscout contract-creation lookupthe deploying EOA
Notice the threads converging
Proxy asset() resolution is how a receipt token finds the asset it wraps — the WRAP_UNWRAP edge from L21's exit liquidity. getOwners() is how a Safe's signers get into the graph — the fan-out targets from L22's multisig expansion. curator() feeds the admin attribution L19's cells are built on. Discovery is the upstream that makes every downstream subsystem you've studied possible.

3 · Two engines flank the loop: seeders and refreshers

The pull-loop discovers structure, but two other mechanisms run alongside it.

Seeders — push known structure in

Some structure is too cheap to rediscover by crawling. A seeder bootstraps a protocol's known shape directly. Discovery keeps a static map of knownFactories — Uniswap V2/V3/V4, Curve, Balancer, Aerodrome, PancakeSwap, etc. — so when a pool's factory() resolves to one, the protocol is inferred instantly, and the factory's PairCreated/PoolCreated events can be walked to enumerate every pool it ever made:

var knownFactories = map[string]string{
    "0x5c69bee701ef814a2b6a3edd4b1652cb9cc5aa6f": "uniswap",   // V2
    "0xba12222222228d8ba445958a75a0704d566bf2c8": "balancer",  // V2 Vault
    "0xb9fc157394af804a3578134a6585c0dc9cc990d4": "curve",     // StableSwap Factory
    // …Aerodrome, Velodrome, Camelot, Maverick, Trader Joe
}

Protocol-specific seeders (curve_lending_seeder.go, balancer_seeder.go, bridge_seeder.go) go further, encoding each protocol's particular topology.

Refreshers — keep the numbers true over time

Discovery finds that a vault holds an asset; the dollar value of that holding drifts every block. A refresher periodically re-reads on-chain state to keep edge USD values current — and this is the part that feeds straight into everything you learned about at_risk:

RefresherRe-readsKeeps fresh
LP Reserve (~60 min)getReserves() + totalSupply() on V2 AMMsLP-holder HOLDS USD values
Receipt Token (~60 min)protocol-specific total underlying (e.g. ERC-4626 totalAssets())receipt-token HOLDS USD values
Oracle Bridger (~30 min)nothing (pure graph) — propagates transitive ORACLE_DEPwhich markets inherit which oracles (L20!)
This is the source of L19–L23's dollars
Remember at_stake_usd, exit_v2_total, the receipt-token TVL in Path 3b? Those numbers are kept current by these refreshers. The Oracle Bridger is literally how the per-(oracle, market) eligibility from L20's oracle path comes to exist. Discovery and at_risk aren't separate stories — discovery is at_risk's data supply chain.

4 · Why it doesn't explode

A self-expanding crawl over a fully-connected financial graph could, in principle, swallow the whole chain. It doesn't, because expansion is value-gated — the same bounded-traversal discipline you met in L2:

The trade-off worth naming
Discovery is RPC-bound — every probe is a network call to an archive node, rate-limited against Etherscan/Blockscout. That's why classification batches probes, caches aggressively, and degrades gracefully (a failed StorageAt just skips that related address with a WARN — fail-loop, not fail-stop, exactly like L8/L23). Coverage vs RPC budget is the central tension of this subsystem.

Check yourself

1. Why does the indexer discover the graph instead of using a hardcoded map of DeFi contracts?
2. What closes the discovery loop into a self-expanding flywheel?
3. How does resolveProxy find a proxy's implementation?
4. A contract responds to asset(). Discovery concludes…
5. What's the difference between a seeder and a refresher?
6. The LP Reserve and Receipt Token refreshers exist to…
7. Why doesn't the self-expanding crawl swallow the entire chain?
8. A StorageAt probe fails mid-classification. What happens?
↳ Ask your teacher
Try: "Walk the multisig getOwners() probe and how it becomes OWNS edges." · "How does the curve_lending_seeder encode Curve's topology?" · "What's project inference (stage 10) — the ~50 protocol patterns?" · "Show me how the Oracle Bridger computes transitive ORACLE_DEP." · "How are RPC calls rate-limited / cached against Etherscan?"

What you can now do

Grounded in: docs/enrichment-pipeline.md (15-stage pipeline, stage 1 RPC classification, stage 15 discovery propagation, periodic refreshers), pkg/enrichment/enricher.go (resolveProxy EIP-1967 slots, detectWrapper asset()/underlying()/stETH(), RelatedAddrs accumulation), pkg/enrichment/discovery.go (knownFactories, KnownV2/V3Factories), rpc_calls.go (implSlot/adminSlot/beaconSlot), the protocol seeders + LP/receipt/oracle refreshers. Verify against source — the code is the truth.