Lesson 01 · The Pipeline

Events in, graph edges out

The one mental model that makes the whole codebase make sense. ~8 minutes.

You already know: EVM events New: 3-binary pipeline New: the monitored set

Strip away the 30 binaries and 35 packages and this system does exactly one thing: it watches the blockchain and turns on-chain events into edges in a graph. That's it. Everything else is detail hanging off that spine.

You already know the input side cold. An ERC-20 Transfer(from, to, value) log, a contract creation in the block traces, an Aave ReserveInitialized — these are your home turf. This lesson connects that knowledge to the output side: a risk graph in Memgraph, made of nodes (addresses) and typed edges (relationships).

Your anchor for this whole lesson

An ERC-20 Transfer event becomes a HOLDS edge (wallet → token, carrying a balance). Hold onto that one sentence. Every other event-to-edge mapping is a variation on it.

1 · The graph: what "out" even means

Before the pipeline, picture the destination. The graph is a property graph (like Neo4j — Memgraph speaks the same Bolt protocol and Cypher query language). Two ingredients:

Nodes — mostly Ethereum addresses: EOAs, tokens, contracts, pools, vaults, multisigs.
Edges — typed, directional relationships between them. There are 19 edge types.

The edge types are literally constants in the code. A sample (from pkg/types/schema.go):

// pkg/types/schema.go — the vocabulary of the graph
EdgeHolds             EdgeType = "HOLDS"              // wallet holds a token balance
EdgeApproves          EdgeType = "APPROVES"           // owner approved a spender
EdgeAdminCtrl         EdgeType = "ADMIN_CTRL"         // admin controls a contract
EdgeDeployedBy        EdgeType = "DEPLOYED_BY"        // contract → its deployer
EdgeLendingCollateral EdgeType = "LENDING_COLLATERAL" // collateral backing a loan
EdgeOracleDep         EdgeType = "ORACLE_DEP"         // thing depends on a price oracle
// …19 in total. This file is the ground-truth list.

Source: risk-graph-indexer/pkg/types/schema.go (the EdgeType constants). This is the file to open when you need the canonical edge/node names.

Why a graph at all?

Risk is about relationships: who holds what, who controls whom, what backs what. "If token X depegs, which vaults are exposed, and who's the admin that could rug them?" is a graph traversal, not a SQL join. That's why the output is a graph and not a table.

2 · The three binaries

The journey from RPC to graph is split across three stateless services, connected by Redis. This split is the first thing to internalize — it tells you where any given piece of logic lives.

cmd/block-ingest

① block-ingest

The producer · one instance per chain

Fetches each block from the Ethereum RPC — block + receipts + traces, in one batched JSON-RPC round-trip. Caches every RPC response in Redis (7-day TTL, so replays cost zero RPC calls). Publishes the complete block payload onto a Redis Stream named blocks:{chainID}.

▼ Redis Stream blocks:{chainID} — a durable queue of blocks

cmd/indexer · pkg/decoder · pkg/indexer

② indexer

The decoder + graph writer · the heart of the system

Consumes blocks from the stream. Decodes logs into typed events (20 decoders). Filters them against the monitored set (more below). Batches the survivors into a handful of Cypher writes and commits them to Memgraph — atomically with the block cursor so it can never lose its place.

▼ Memgraph — bare nodes tagged pending_enrichment=true

cmd/enrichment-worker · pkg/enrichment · pkg/risk

③ enrichment-worker

The classifier + risk engine · runs asynchronously

Polls Memgraph for the bare nodes the indexer just created. Figures out what each address is (EOA vs contract, proxy, vault, multisig…) via RPC + Etherscan + Blockscout, writes metadata and extra edges, and discovers related addresses to watch. Separately runs the periodic risk math (exposure, DebtRank, concentration).

Why split into three? (a real onboarding insight)

Each stage has a different bottleneck. block-ingest is RPC-bound. The indexer must be fast and ordered — it can't block on slow external APIs or it falls behind the chain. Enrichment is slow and call-heavy (Etherscan rate limits!), so it's pushed off the hot path to run lazily. The Redis Stream between ①→② is a shock absorber: ingest can race ahead and buffer blocks while the indexer catches up. This is the streaming-pipeline pattern you flagged as new — and that's all it is: decouple producers from consumers with a durable queue.

3 · The monitored set — the 95% filter

Here's the trick that makes real-time indexing of all of Ethereum tractable. The indexer does not care about most events. It keeps a Redis Set called monitored:{chainID} containing every address the system cares about. For each decoded event it asks one O(1) question:

// Is either party an address we track? (Redis SISMEMBER — O(1))
if !monitored(event.from) && !monitored(event.to) {
    drop(event)   // ~95% of all chain activity exits here
}

If neither side is monitored, the event is dropped. This is why the system can keep up with mainnet: it's doing a set-membership check, not processing every Transfer on the chain.

Three ways an address gets into the set:

Source	When
Bootstrap	One-time seed of known addresses at startup.
Promotion	An unknown address receives a focus-token transfer ≥ $1M USD → it's now interesting, add it.
Discovery	Enrichment finds a related address (a proxy's implementation, a multisig's owner, a deployer) → add it.

Source: risk-graph-indexer/docs/architecture.md § "Monitored Set". The promotion threshold and discovery chain are the system's growth mechanism — the graph expands itself as it runs.

4 · Follow one Transfer all the way through

Let's make it concrete with your anchor event. A monitored wallet receives 1,000 USDC:

RPC → block-ingest: the block (with this Transfer log) is fetched and published to blocks:{chainID}.
Stream → indexer: the indexer pulls the block off the stream.
Decode: the ERC-20 decoder recognizes the Transfer(address,address,uint256) topic and produces a typed transfer event.
Filter: to (the wallet) is in the monitored set → keep it. (If the wallet were unknown but the amount ≥ $1M, promotion would add it here.)
Mutate: the indexer adjusts the wallet's balance and upserts a HOLDS edge (wallet → USDC). If USDC or the wallet were brand-new nodes, they're created bare with pending_enrichment=true.
Commit: that edge + the new block cursor are written to Memgraph in one atomic transaction.
Later, async: enrichment-worker sees the bare node, classifies it, fills in symbol/decimals/labels, and may discover related addresses to monitor.

Translate your EVM instinct

You'd normally read a Transfer as "balance moved." Here it's that plus "this is now an edge in a risk graph, and the receiver might be worth watching." Same event — the system just cares about a different downstream meaning.

Check yourself

Immediate feedback — pick an answer and it'll tell you right away. No grade is recorded; this is just to catch misconceptions while they're cheap.

1. A Transfer event arrives where neither the sender nor receiver is in the monitored set, and the value is small. What happens?

2. Which binary is responsible for deciding an address is a proxy and finding its implementation contract?

3. Why is there a Redis Stream sitting between block-ingest and the indexer?

4. You need to find the exact, authoritative list of every edge type the graph can contain. Where do you look?

5. An indexer commit writes a batch of edge mutations and the block cursor in one transaction. Why bundle the cursor in?

↳ Your teacher is one message away

I'm the agent that built this lesson — ask me anything that didn't land. Good questions for right now: "Show me the actual decoder code for a Transfer," · "What does a HOLDS edge look like in Cypher?" · "What's a property graph vs a SQL table, concretely?" · "Where would I add a brand-new event decoder?"

What you can now do

Name the three binaries and say what each one owns.
Explain the monitored-set filter and why it makes real-time indexing possible.
Trace a Transfer from RPC to a HOLDS edge.
Know that pkg/types/schema.go is the source of truth for the graph's vocabulary.

Next →Lesson 02 · The Data Model

Grounded in: docs/architecture.md, README.md, pkg/types/schema.go, cmd/indexer/main.go of the risk-graph-indexer repo. Verify any claim against the source — the code is the truth.