The async classifier — and the discovery loop that makes the graph grow itself. ~11 min.
The indexer creates nodes bare — just an id, a graph_id,
and pending_enrichment=true (you saw this born in Lesson 4's
ON CREATE SET). The enrichment-worker is the third binary, running asynchronously off the
hot path, whose whole job is to answer: what is this address, and who is it connected to? —
and then to feed its discoveries back into the monitored set, so the graph expands itself.
eth_getCode, EIP-1967 storage reads,
getOwners(), plus Etherscan/Blockscout HTTP. If the indexer did this inline it would fall
behind the chain. So it's pushed here, off the hot path, where it can take its time. Indexer = fast & ordered; enrichment = slow & thorough.
The worker is a long-running loop. In Go, Run is launched as a goroutine
(a lightweight concurrent thread, started with the go keyword) and spins forever, asking
Memgraph for the next batch of bare nodes:
// pkg/enrichment/worker.go — the worker loop (shape) func (w *Worker) Run(ctx context.Context) error { for { // loop until ctx is cancelled n, err := w.processBatch(ctx) // claim + enrich a batch of pending nodes if ctx.Err() != nil { return ctx.Err() } // back off when there was nothing to do, then loop again } }
It finds work with an index-gated query — anchored, never a full scan (Lesson 2's rule):
MATCH (n:Entity {graph_id: $g}) WHERE n.pending_enrichment = true ...
You can run several enrichment workers for throughput. That raises a classic concurrency
problem: two workers must not enrich the same node at once. The fix is a claim — a worker
stamps a node as "mine, until claimTTL expires" before working it. The node moves through a
small lifecycle:
| State | Meaning |
|---|---|
pending | pending_enrichment=true, no claim marker — up for grabs. |
claimed | A worker holds it (with a TTL lease, default ~120s). Others skip it. |
completed | All stages done → pending_enrichment=false. |
The TTL matters: if a worker crashes mid-node, the claim expires and another worker can re-claim it — no node gets stuck forever.
Source: pkg/enrichment/worker.go (Worker.Run, processBatch, claimTTL), pkg/enrichment/claim_lifecycle.go + claim_gate.go.
For each claimed node the worker runs a 15-stage pipeline (documented in docs/enrichment-pipeline.md). You already know most of the on-chain probes — they're the exact calls you'd make by hand:
| Stage (grouped) | How it probes — your EVM knowledge |
|---|---|
| RPC classification | eth_getCode → EOA vs contract; EIP-1967 slot read → proxy + impl; getOwners() → multisig; asset()/underlying() → vault/wrapper. |
| Token metadata | symbol(), name(), decimals(), totalSupply() (fails silently for non-tokens). |
| External APIs | Etherscan (contract name, ABI, verification) + Blockscout (nametags, labels, deployer, scam flags). |
| ABI parsing | Function signatures → label tags (vault / oracle / pool / lending / admin). |
| Classification | Combine all signals → a class_subtype. |
The final classification is plain Go — a readable switch you could extend in a PR
(pkg/enrichment/classify.go):
func ClassSubtype(res *Result) string { if res.IsProxyAdmin { return "proxy_admin" } if res.IsGovernanceAdmin { return "governance_admin" } switch res.NodeType { case types.NodeMultisig: return "safe_smart_account" case ...: return "debt_token", "receipt_token", "curator", "bridge", "oracle" ... } return "" }
Source: pkg/enrichment/classify.go (ClassSubtype, LabelSource, …) + the 15 stages in docs/enrichment-pipeline.md.
Classification produces edges the indexer couldn't derive from a single event — the
CONTROLLED_BY / OPERATED_BY families from
Lesson 2: ADMIN_CTRL, OWNS,
CURATES, VAULT_ASSET, BELONGS_TO, plus transitively-derived
OWNS_ADMIN / RESERVE_BACKING. They're written in one Memgraph transaction
— the same atomic, MERGE-based pattern you learned in Lesson 4.
HOLDS, APPROVES).
Enrichment writes the slow, classification-sourced structural edges (who controls / owns /
curates / belongs to what). Same graph, two writers, different cadences.
This is the punchline of the whole system. While classifying, the worker discovers related
addresses — a proxy's implementation, a multisig's owners, a vault's curator, a contract's deployer.
Stage 15 takes all of those RelatedAddrs and adds them to the monitored set:
monitored:{chain}Because enrichment leans on external APIs, it must survive them being slow or down. The package has rate-limited clients and circuit breakers (backoff.go, blockscout_breaker) so a flaky Etherscan can't stall the whole worker. As a contributor, expect any external-API stage to be wrapped in retry/backoff — don't add a raw HTTP call.
Also runs alongside as independent goroutines: oracle-bridger, LP/receipt refreshers, parity monitor (see docs/enrichment-pipeline.md § Periodic Maintenance). These are the start of the risk engine — a future lesson.
getCode, EIP-1967, getOwners, asset).HOLDS/etc. + cursor → enrichment-worker classifies the bare nodes,
writes structural edges, and feeds discovered addresses back into the monitored set → the graph grows.
The risk engine then runs analytics (DebtRank, exposure, AT_RISK) on top. You now understand all three binaries.
Grounded in: docs/enrichment-pipeline.md (the 15 stages + periodic tasks),
pkg/enrichment/worker.go (Run, processBatch, claimTTL), classify.go (ClassSubtype),
claim_lifecycle.go/claim_gate.go, backoff.go. Verify against source — the code is the truth.