The one mental model that makes the whole codebase make sense. ~8 minutes.
You already know: EVM eventsNew: 3-binary pipelineNew: the monitored set
Strip away the 30 binaries and 35 packages and this system does exactly
one thing: it watches the blockchain and turns on-chain events into edges in a graph.
That's it. Everything else is detail hanging off that spine.
You already know the input side cold. An ERC-20 Transfer(from, to, value) log,
a contract creation in the block traces, an Aave ReserveInitialized — these are
your home turf. This lesson connects that knowledge to the output side: a risk graph
in Memgraph, made of nodes (addresses) and typed edges (relationships).
Your anchor for this whole lesson
An ERC-20 Transfer event becomes a HOLDS edge
(wallet → token, carrying a balance). Hold onto that one sentence. Every other
event-to-edge mapping is a variation on it.
1 · The graph: what "out" even means
Before the pipeline, picture the destination. The graph is a property graph
(like Neo4j — Memgraph speaks the same Bolt protocol and Cypher query language). Two ingredients:
Edges — typed, directional relationships between them. There are 19 edge types.
The edge types are literally constants in the code. A sample (from
pkg/types/schema.go):
// pkg/types/schema.go — the vocabulary of the graph
EdgeHolds EdgeType = "HOLDS" // wallet holds a token balance
EdgeApproves EdgeType = "APPROVES" // owner approved a spender
EdgeAdminCtrl EdgeType = "ADMIN_CTRL" // admin controls a contract
EdgeDeployedBy EdgeType = "DEPLOYED_BY" // contract → its deployer
EdgeLendingCollateral EdgeType = "LENDING_COLLATERAL" // collateral backing a loan
EdgeOracleDep EdgeType = "ORACLE_DEP" // thing depends on a price oracle// …19 in total. This file is the ground-truth list.
Source: risk-graph-indexer/pkg/types/schema.go (the EdgeType constants).
This is the file to open when you need the canonical edge/node names.
Why a graph at all?
Risk is about relationships: who holds what, who controls whom, what backs what.
"If token X depegs, which vaults are exposed, and who's the admin that could rug them?"
is a graph traversal, not a SQL join. That's why the output is a graph and not a table.
2 · The three binaries
The journey from RPC to graph is split across three stateless services,
connected by Redis. This split is the first thing to internalize — it tells you where
any given piece of logic lives.
cmd/block-ingest
① block-ingest
The producer · one instance per chain
Fetches each block from the Ethereum RPC — block + receipts + traces,
in one batched JSON-RPC round-trip. Caches every RPC response in Redis (7-day TTL, so
replays cost zero RPC calls). Publishes the complete block payload onto a
Redis Stream named blocks:{chainID}.
▼ Redis Streamblocks:{chainID} — a durable queue of blocks
cmd/indexer · pkg/decoder · pkg/indexer
② indexer
The decoder + graph writer · the heart of the system
Consumes blocks from the stream. Decodes logs into typed
events (20 decoders). Filters them against the monitored set
(more below). Batches the survivors into a handful of Cypher writes and commits them to
Memgraph — atomically with the block cursor so it can never lose its place.
▼ Memgraph — bare nodes tagged pending_enrichment=true
cmd/enrichment-worker · pkg/enrichment · pkg/risk
③ enrichment-worker
The classifier + risk engine · runs asynchronously
Polls Memgraph for the bare nodes the indexer just created. Figures out
what each address is (EOA vs contract, proxy, vault, multisig…) via RPC + Etherscan
+ Blockscout, writes metadata and extra edges, and discovers related addresses to
watch. Separately runs the periodic risk math (exposure, DebtRank, concentration).
Why split into three? (a real onboarding insight)
Each stage has a different bottleneck. block-ingest is RPC-bound. The indexer must
be fast and ordered — it can't block on slow external APIs or it falls behind the chain.
Enrichment is slow and call-heavy (Etherscan rate limits!), so it's pushed off the hot path
to run lazily. The Redis Stream between ①→② is a shock absorber: ingest can race ahead and
buffer blocks while the indexer catches up. This is the streaming-pipeline pattern you flagged
as new — and that's all it is: decouple producers from consumers with a durable queue.
3 · The monitored set — the 95% filter
Here's the trick that makes real-time indexing of all of Ethereum tractable. The indexer
does not care about most events. It keeps a Redis Set called
monitored:{chainID} containing every address the system cares about. For each
decoded event it asks one O(1) question:
// Is either party an address we track? (Redis SISMEMBER — O(1))
if !monitored(event.from) && !monitored(event.to) {
drop(event) // ~95% of all chain activity exits here
}
If neither side is monitored, the event is dropped. This is why the system can keep up with
mainnet: it's doing a set-membership check, not processing every Transfer on the chain.
Three ways an address gets into the set:
Source
When
Bootstrap
One-time seed of known addresses at startup.
Promotion
An unknown address receives a focus-token transfer ≥ $1M USD → it's now interesting, add it.
Discovery
Enrichment finds a related address (a proxy's implementation, a multisig's owner, a deployer) → add it.
Source: risk-graph-indexer/docs/architecture.md § "Monitored Set". The promotion
threshold and discovery chain are the system's growth mechanism — the graph expands itself as it runs.
4 · Follow one Transfer all the way through
Let's make it concrete with your anchor event. A monitored wallet receives 1,000 USDC:
RPC → block-ingest: the block (with this Transfer log) is fetched and published to blocks:{chainID}.
Stream → indexer: the indexer pulls the block off the stream.
Decode: the ERC-20 decoder recognizes the Transfer(address,address,uint256) topic and produces a typed transfer event.
Filter:to (the wallet) is in the monitored set → keep it. (If the wallet were unknown but the amount ≥ $1M, promotion would add it here.)
Mutate: the indexer adjusts the wallet's balance and upserts a HOLDS edge (wallet → USDC). If USDC or the wallet were brand-new nodes, they're created bare with pending_enrichment=true.
Commit: that edge + the new block cursor are written to Memgraph in one atomic transaction.
Later, async: enrichment-worker sees the bare node, classifies it, fills in symbol/decimals/labels, and may discover related addresses to monitor.
Translate your EVM instinct
You'd normally read a Transfer as "balance moved." Here it's that plus
"this is now an edge in a risk graph, and the receiver might be worth watching." Same event —
the system just cares about a different downstream meaning.
Check yourself
Immediate feedback — pick an answer and it'll tell you right away. No grade is recorded; this is just to catch misconceptions while they're cheap.
1. A Transfer event arrives where neither the sender nor receiver is in the monitored set, and the value is small. What happens?
2. Which binary is responsible for deciding an address is a proxy and finding its implementation contract?
3. Why is there a Redis Stream sitting between block-ingest and the indexer?
4. You need to find the exact, authoritative list of every edge type the graph can contain. Where do you look?
5. An indexer commit writes a batch of edge mutations and the block cursor in one transaction. Why bundle the cursor in?
↳ Your teacher is one message away
I'm the agent that built this lesson — ask me anything that didn't land. Good questions for right now:
"Show me the actual decoder code for a Transfer," · "What does a HOLDS edge look like in Cypher?" ·
"What's a property graph vs a SQL table, concretely?" · "Where would I add a brand-new event decoder?"
What you can now do
Name the three binaries and say what each one owns.
Explain the monitored-set filter and why it makes real-time indexing possible.
Trace a Transfer from RPC to a HOLDS edge.
Know that pkg/types/schema.go is the source of truth for the graph's vocabulary.
Grounded in: docs/architecture.md, README.md,
pkg/types/schema.go, cmd/indexer/main.go of the risk-graph-indexer repo.
Verify any claim against the source — the code is the truth.