Lesson 02 · The Data Model

Nodes, edges, and the shape of the risk graph

The graph's vocabulary — and the three properties that will bite you if you ignore them. ~10 minutes.

Builds on: Lesson 1 New: property graph nodes New: edge categories New: reading Cypher

In Lesson 1 you learned the graph is the output: events in, edges out. Now we open the box. By the end you'll know what a node actually is, the eight families the edges fall into, and the three node properties every contributor must respect — get one wrong and your query silently returns garbage or scans the whole DB.

This lesson has a companion
Everything here is distilled into a printable cheat-sheet: 📋 the Glossary. Keep it open in a tab while you work — it's the canonical vocabulary we'll use in every future lesson.

1 · A node is an :Entity

Here's the first surprising thing. Almost every node in the graph carries the same primary label: :Entity. The kind of thing it is (token, vault, multisig…) lives in a property, and is also mirrored as a secondary label.

(:Entity:Token {
  id: "0xa0b8...eb48",  // the address — the node's identity
  graph_id: "risk-graph-rt",  // which graph partition
  category: "token",  subcategory: "stablecoin",
  symbol: "USDC",  usd_price: 1.0,
  pending_enrichment: false
})

The node's identity is its address (id). There are 19 node types — the full list is in pkg/types/schema.go as NodeType constants and lives in your glossary. A few you'll meet constantly:

Node typeWhat it isSecondary label
tokenAn ERC-20 / focus token:Token
eoaAn externally-owned account (a wallet):EOA
contractA generic smart contract:Contract
multisigA Gnosis-Safe-style multisig:Multisig
vaultAn ERC-4626 / yield vault:Vault
lending_marketA money market (Aave aToken, Compound cToken…):LendingMarket
oracleA price feed (Chainlink etc.):Oracle

Source: pkg/types/schema.goNodeType constants + NodeTypeToLabel map.

2 · The three properties that will bite you

graph_id — the partition key

One Memgraph instance holds multiple independent graphs side by side, separated only by the graph_id property on every node and every edge. The live one might be risk-graph-rt; a shadow/test one might be test_carlos. They share the database but must never mix in a query.

Contributor rule #1
Every query filters by graph_id. Forget it and you'll union two graphs together and get nonsense — or accidentally read another partition's data. It's a property, not a label, so the DB won't stop you. This is the #1 way new contributors get silently-wrong results.

category / subcategory — with a legacy trap

Current code stores a node's kind in category + subcategory. But older nodes carry type + subtype instead. The repo's own CLAUDE.md mandates the defensive read:

// Reading a node's kind safely across old + new nodes (from CLAUDE.md):
coalesce(n.type, n.subcategory, n.category)

Source: risk-graph-indexer/CLAUDE.md § Sanity checks. coalesce returns the first non-null — so it works whether the node is legacy (type) or current (category).

pending_enrichment — the lifecycle flag

From Lesson 1: the indexer creates bare nodes with pending_enrichment=true, and the enrichment-worker flips it to false once it's classified. So this boolean tells you whether a node's metadata (symbol, decimals, labels) can be trusted yet.

3 · Edges roll up into eight categories

There are ~19+ edge types, but you don't memorise them flat — they group into eight parent categories (the EdgeCategory map in schema.go). Learn the categories and the types slot in underneath:

CategoryMeaningEdge types in it
CONTAINSX holds/contains a balance of YHOLDS, POOL_ASSET, VAULT_ASSET, RESERVE_BACKING
CONTROLLED_BYX is controlled/approved/ownedADMIN_CTRL, ADMIN_OF, OWNS, OWNS_ADMIN, CURATES, APPROVES, CUSTODY_VIA
COLLATERALISED_BYX is backed by collateral YLENDING_COLLATERAL, VAULT_ALLOCATION
DEPENDS_ONX structurally depends on YORACLE_DEP, BRIDGE_BACKED_BY, DVN_VERIFIES, SUBORDINATE_TO
DERIVED_FROMX is a derivative/receipt of YRECEIPT_FOR, DEBT_FOR, WRAP_UNWRAP
OPERATED_BYX is operated/deployed by YDEPLOYED_BY, SERVICE_FOR
AUDIT_TRAILA historical/governance recordAFFECTS (governance actions)
ANALYTICALDerived risk projection, not real topologyAT_RISK

Source: pkg/types/schema.go — the EdgeCategory map. The full edge list with directions + properties lives in your glossary.

Architectural vs Analytical — a real distinction
The first seven categories are architectural: they describe the real on-chain topology (who holds, controls, backs what). ANALYTICAL edges like AT_RISK are computed by the risk engine and layered under the true graph — "if this admin key is compromised, these tokens are at risk." They're projections, not structure, and the graph-viz architecture view deliberately hides them (FORTA-2986).
Your EVM anchor, extended
You already read these relationships on-chain every day — the graph just makes them queryable. An Aave aToken's underlying reserve is a LENDING_COLLATERAL edge. A proxy's admin is an ADMIN_CTRL edge. A Safe's signers are OWNS edges. Same facts you'd pull from contract storage — now they're traversable.

4 · Reading your first Cypher

Cypher is "ASCII-art SQL for graphs." Nodes are (parens), edges are -[brackets]-> with a direction. Here's a real-shaped read — "what tokens does this wallet hold?":

MATCH (w:Entity {id: $wallet, graph_id: $g})-[r:HOLDS]->(t:Entity {graph_id: $g})
RETURN t.symbol, r.quantity_raw, t.usd_price

Read it left to right: start at the wallet node w (anchored by its id), follow an outgoing HOLDS edge r, to a token node t. Return the token's symbol, the raw balance on the edge, and its price.

Notice three things, all of which are rules, not style:

Contributor rules #2 and #3 (straight from CLAUDE.md)
#2 — No full-graph scans. Never write MATCH (n) with no anchor. The graph is huge; an unanchored scan = OOM + timeout. Always start from a known id or a label+index, bound depth, and LIMIT.

#3 — Query both directions. Different seeders may write the "same" logical edge in opposite directions. Never assume which way an edge points — check, or match both (-[r]- without an arrow) when you're unsure.

Source: risk-graph-indexer/CLAUDE.md § "Graph DB queries — no brute-force" and § "Before writing code". Real query shapes verified against pkg/ (e.g. the :HOLDS reads in the graphwrite/risk packages).

Check yourself

Instant feedback — these target the exact things that trip up new contributors.

1. You write MATCH (t:Entity)-[:HOLDS]->(x) RETURN x and get results from two unrelated graphs mixed together. What did you forget?
2. An old node has type: "token" but no category. A newer node has category: "token" but no type. How do you read the kind safely for both?
3. The AT_RISK edge is in the ANALYTICAL category. What does that tell you?
4. A Gnosis Safe's signers, a proxy's admin, and a token approval all share which edge category?
5. Why is MATCH (n) RETURN n a fireable offense in this codebase?
↳ Ask your teacher
Try these in chat: "Open the real HOLDS write query in the code," · "What's the difference between ADMIN_CTRL and ADMIN_OF?" · "Walk me through a query that finds every vault exposed to USDC," · "Why is everything an :Entity instead of separate labels?"

What you can now do

Grounded in: pkg/types/schema.go (NodeType, EdgeType, EdgeCategory, NodeTypeToLabel), CLAUDE.md (graph-id partition, coalesce rule, no-scan / both-directions rules), docs/architecture.md. Verify against source — the code is the truth.