Lesson 24 · Discovery / Enrichment Internals · New Subsystem

The discovery flywheel

How the graph grows itself — one address discovers the next, without a hardcoded map of DeFi. ~14 min.

Builds on: L2 · L5 · L10 Anchor: EIP-1967, Safe, ERC-4626 New: self-expansion loop New: seeders vs refreshers

DeFi has tens of thousands of contracts and no central registry. You cannot hardcode the map — it changes daily. So the indexer doesn't try. It starts from a few seeds and grows the graph by following the chain itself: every contract it enriches reveals the addresses it depends on, those get queued, and the loop repeats. That self-expanding loop is the discovery flywheel — the subsystem that turns "watch these 5 tokens" into "watch the entire dependency cone around them."

Your anchor: you already know how to "follow" a contract

Given a contract you'd instinctively poke it: is it a proxy? (read the EIP-1967 slot for the impl). Is it a Gnosis Safe? (getOwners()). Is it an ERC-4626 vault? (asset()). Each answer hands you more addresses to investigate — the impl, the signers, the underlying. Discovery is exactly that instinct, automated and run to a fixpoint.

1 · The loop

L2 gave you the node lifecycle (bare → enriched). L5 gave you the 15-stage worker. The piece those left implicit is that stage 1 and stage 15 form a cycle:

bare nodepending_enrichment=true

→

classify (RPC)proxy? safe? vault?

→

RelatedAddrsimpl, owners, underlying…

→

monitored setstage 15 propagation

⤺ each related address re-enters as a new bare node — the loop closes

The worker polls Memgraph for pending_enrichment=true nodes (a pull loop), runs the pipeline, and at the end (stage 15) adds every discovered RelatedAddrs to the monitored set. Those become new bare nodes, get polled, and the cycle continues until it reaches a fixpoint — no new addresses. The graph discovers its own boundary.

2 · Where related addresses come from (the RPC probes)

Stage 1 is pure on-chain interrogation — and every probe that succeeds appends to RelatedAddrs. Here's the proxy resolver, reading the three canonical EIP-1967 storage slots directly:

// enricher.go — resolveProxy (slots from rpc_calls.go)
implBytes, _ := telemetry.StorageAt(ctx, rpc, addr, implSlot, nil)   // 0x360894…382bbc
implAddr := common.BytesToAddress(implBytes)
if implAddr != zeroAddr {
    c.IsProxy = true
    c.Implementation = &implAddr
    c.RelatedAddrs = append(c.RelatedAddrs, implAddr)        // ← follow the impl
}
// then adminSlot (0xb531…6103) → ProxyAdmin, beaconSlot (0xa3f0…3d50) → ProxyBeacon, both appended too

Probe	EVM mechanism	Discovers
Proxy resolution	`eth_getStorageAt` on EIP-1967 impl/admin/beacon slots	implementation, proxy admin, beacon
Wrapper detection	`asset()` (ERC-4626), `underlying()`, `stETH()` (wstETH)	the underlying token
Multisig detection	`getOwners()` on a Gnosis Safe	each signer key (→ L22's expansion!)
Curator/manager	`curator()` / `manager()` on vaults	the controlling entity (→ CURATES edge)
Deployer	Etherscan/Blockscout contract-creation lookup	the deploying EOA

Notice the threads converging

Proxy asset() resolution is how a receipt token finds the asset it wraps — the WRAP_UNWRAP edge from L21's exit liquidity. getOwners() is how a Safe's signers get into the graph — the fan-out targets from L22's multisig expansion. curator() feeds the admin attribution L19's cells are built on. Discovery is the upstream that makes every downstream subsystem you've studied possible.

3 · Two engines flank the loop: seeders and refreshers

The pull-loop discovers structure, but two other mechanisms run alongside it.

Seeders — push known structure in

Some structure is too cheap to rediscover by crawling. A seeder bootstraps a protocol's known shape directly. Discovery keeps a static map of knownFactories — Uniswap V2/V3/V4, Curve, Balancer, Aerodrome, PancakeSwap, etc. — so when a pool's factory() resolves to one, the protocol is inferred instantly, and the factory's PairCreated/PoolCreated events can be walked to enumerate every pool it ever made:

var knownFactories = map[string]string{
    "0x5c69bee701ef814a2b6a3edd4b1652cb9cc5aa6f": "uniswap",   // V2
    "0xba12222222228d8ba445958a75a0704d566bf2c8": "balancer",  // V2 Vault
    "0xb9fc157394af804a3578134a6585c0dc9cc990d4": "curve",     // StableSwap Factory
    // …Aerodrome, Velodrome, Camelot, Maverick, Trader Joe
}

Protocol-specific seeders (curve_lending_seeder.go, balancer_seeder.go, bridge_seeder.go) go further, encoding each protocol's particular topology.

Refreshers — keep the numbers true over time

Discovery finds that a vault holds an asset; the dollar value of that holding drifts every block. A refresher periodically re-reads on-chain state to keep edge USD values current — and this is the part that feeds straight into everything you learned about at_risk:

Refresher	Re-reads	Keeps fresh
LP Reserve (~60 min)	`getReserves()` + `totalSupply()` on V2 AMMs	LP-holder `HOLDS` USD values
Receipt Token (~60 min)	protocol-specific total underlying (e.g. ERC-4626 `totalAssets()`)	receipt-token `HOLDS` USD values
Oracle Bridger (~30 min)	nothing (pure graph) — propagates transitive `ORACLE_DEP`	which markets inherit which oracles (L20!)

This is the source of L19–L23's dollars

Remember at_stake_usd, exit_v2_total, the receipt-token TVL in Path 3b? Those numbers are kept current by these refreshers. The Oracle Bridger is literally how the per-(oracle, market) eligibility from L20's oracle path comes to exist. Discovery and at_risk aren't separate stories — discovery is at_risk's data supply chain.

4 · Why it doesn't explode

A self-expanding crawl over a fully-connected financial graph could, in principle, swallow the whole chain. It doesn't, because expansion is value-gated — the same bounded-traversal discipline you met in L2:

Discovery follows structural dependencies (impl, owner, underlying, asset), not arbitrary transfers — it grows along the dependency cone, not the social graph.
New tokens only enter the monitored set via the ≥ $1M focus-token transfer promotion (L2) — a value threshold, so the flywheel grows toward where real money is, not into dust.
It runs to a fixpoint: once no probe yields an unseen address, the loop quiesces for that cone.

The trade-off worth naming

Discovery is RPC-bound — every probe is a network call to an archive node, rate-limited against Etherscan/Blockscout. That's why classification batches probes, caches aggressively, and degrades gracefully (a failed StorageAt just skips that related address with a WARN — fail-loop, not fail-stop, exactly like L8/L23). Coverage vs RPC budget is the central tension of this subsystem.

Check yourself

1. Why does the indexer discover the graph instead of using a hardcoded map of DeFi contracts?

2. What closes the discovery loop into a self-expanding flywheel?

3. How does resolveProxy find a proxy's implementation?

4. A contract responds to asset(). Discovery concludes…

5. What's the difference between a seeder and a refresher?

6. The LP Reserve and Receipt Token refreshers exist to…

7. Why doesn't the self-expanding crawl swallow the entire chain?

8. A StorageAt probe fails mid-classification. What happens?

↳ Ask your teacher

Try: "Walk the multisig getOwners() probe and how it becomes OWNS edges." · "How does the curve_lending_seeder encode Curve's topology?" · "What's project inference (stage 10) — the ~50 protocol patterns?" · "Show me how the Oracle Bridger computes transitive ORACLE_DEP." · "How are RPC calls rate-limited / cached against Etherscan?"

What you can now do

Explain the discovery flywheel: bare node → RPC classify → RelatedAddrs → monitored set → repeat to a fixpoint.
Name the RPC probes (EIP-1967 slots, asset()/underlying(), getOwners(), curator) and what each discovers.
Distinguish seeders (push known structure) from refreshers (keep USD values fresh) and the pull worker (poll pending_enrichment).
Connect discovery to the subsystems it feeds: WRAP_UNWRAP (L21), multisig signers (L22), ORACLE_DEP eligibility (L20), and the dollar values across L19–L23.
Explain why expansion is bounded (structural deps, ≥$1M promotion, fixpoint) and the coverage-vs-RPC-budget trade-off.

← PreviousLesson 23 · at_risk Lifecycle · Deeper Track Next →Lesson 25 · Admin-Role Discovery · Discovery Internals

Grounded in: docs/enrichment-pipeline.md (15-stage pipeline, stage 1 RPC classification, stage 15 discovery propagation, periodic refreshers), pkg/enrichment/enricher.go (resolveProxy EIP-1967 slots, detectWrapper asset()/underlying()/stETH(), RelatedAddrs accumulation), pkg/enrichment/discovery.go (knownFactories, KnownV2/V3Factories), rpc_calls.go (implSlot/adminSlot/beaconSlot), the protocol seeders + LP/receipt/oracle refreshers. Verify against source — the code is the truth.