Stage 10: how a bare contract becomes "belongs to Aave" — and why it's deliberately a heuristic. ~12 min.
The graph knows a contract is a vault, has an admin, holds a token. One thing it can't read off-chain: which
protocol does this contract belong to? There's no protocol() getter. Yet grouping contracts by project —
"these 40 nodes are all Aave" — drives the admin panel, rule scoping, and cost attribution. Project inference is the
small heuristic that derives it, and studying it is a lesson in knowing when a dumb string match is the right tool.
AToken or VariableDebtToken is
Aave; MetaMorpho is Morpho; crvUSD is Curve; wstETH is Lido. Block explorers encode the same
intuition in their nametags ("Aave: Pool V3"). Project inference just makes that human pattern-matching explicit and
deterministic so every node gets the same answer the batch pipeline would give.
InferProject (pkg/enrichment/project.go) consults three sources, in descending order of
trust, and returns on the first hit:
knownNametagPatterns. Checked first because explorer nametags are human-curated and the most reliable signal available. "Aave: Pool V3" → aaveknownProjects. The fallback when no nametag exists, drawn from the verified source name. "VariableDebtToken" → aaveInferProjectFromSlug), used when the explorer offers a slug but no usable nametag or contract name. "curve|amm|stableswap" → curve-financeAll three are plain strings.Contains matches — no RPC, no graph traversal, no ABI parsing. It's the cheapest
stage in the whole pipeline, which is precisely why it runs as a quick label pass rather than an on-chain probe.
Substring matching has an obvious trap: "curve" is a substring of a "crvusd" contract's nametag, and
both are real but different canonical projects. The fix is purely structural — the pattern lists are ordered
most-specific-first, and the first match wins:
var knownProjects = []projectPattern{ {"metamorpho", "morpho"}, // before the generic "morpho" {"morpho", "morpho"}, {"atoken", "aave"}, // before the generic "aave" {"variabledebt", "aave"}, {"crvusd", "curve-finance"}, // before "curve" {"curve", "curve-finance"}, // …~50 patterns total }
This is the same "order is load-bearing" discipline you saw in L19's cap pipeline and L20's cell dedup — here it's what keeps a specific token from being swallowed by its protocol's generic prefix.
Every inferred value is piped through NormalizeLabelAPIProject before it's returned. Why does that matter? Because
the realtime indexer and the Python batch pipeline both write a project field, and they must produce the
byte-identical slug or the parity harness (L23) flags a mismatch:
// the patterns already encode canonical slugs; normalization is defence-in-depth {"sushi": "sushiswap"}, {"curve": "curve-finance"}, {"maker": "sky"}, {"eigencloud": "eigenlayer"}
"maker" → "sky" and "dai" → "sky" — MakerDAO became Sky, and the canonical slug
carries that so old and new names converge on one project node. Vendor drift: "eigencloud" → "eigenlayer",
"sushi" → "sushiswap" — the same protocol named differently by different explorers all normalize to one slug. The
goal isn't "the objectively right name"; it's the exact slug the batch pipeline writes, so RT and batch agree per node.
Some labels look like attribution but aren't a protocol. "stablecoin", "dex", "lending",
"erc20-token", "safe", "mev-bot" — all describe a kind, not a project. Inference returns
empty for these rather than stamping a meaningless BELONGS_TO:
var projectReservedSortedBy = map[string]bool{ "token-contract": true, "stablecoin": true, "dex": true, "lending": true, "oracle": true, "bridge": true, // …treated as no-match }
"DAIProxyHelper" would match "dai" → sky even if it's unrelated to Sky. The team accepts that imprecision
because the alternative (structural inference from graph topology, or per-contract curation) costs far more for a field
that's organizational, not safety-critical. The reserved-generic filter is the floor: better no attribution than a
confidently-wrong one. Correctness here is defined as "matches batch", not "objectively perfect".
The inferred slug becomes the project field, a BELONGS_TO edge to a protocol node, plus
project_source (nametag / contract_name) and project_category (defi / infra / uncategorized) — stage 10 of
the L24 pipeline. Downstream it's the grouping key for: the admin panel (L17, "show me everything in Aave"),
rule scoping (L12, rules that target a protocol), and the cost-allocation model (attributing on-chain
signal to the customer who controls a protocol). It's the connective tissue that turns a flat node set into protocols.
InferProject checks nametag, then contract name, then labels_slug. Why that order?"crvusd" before "curve" and "metamorpho" before "morpho". The reason is…NormalizeLabelAPIProject. What's the point?"maker" and "dai" to "sky". This encodes…"stablecoin". What does inference return?"DAIProxyHelper" gets attributed to sky despite being unrelated. How does the team view this?project field this stage produces is used downstream primarily as…project field.Grounded in: pkg/enrichment/project.go (InferProject nametag→contract-name cascade, knownNametagPatterns + knownProjects specificity-ordered ~50-pattern lists, NormalizeLabelAPIProject + labelAPIProjectNorm canonical slugs incl. maker→sky / sushi→sushiswap, labelAPIGenericProjects + projectReservedSortedBy reserved-generic filter, InferProjectFromSlug first-token slug path; TestInferProject_CanonicalIdempotent parity test). Stage 10 of the enrichment pipeline (L24). Verify against source — the code is the truth.