"Periodic, not per-block." "Sample, don't sweep." "Multicall3: 50 calls → 2." "Cache the balance." "Pure
graph, no RPC." You've met these as local choices in a dozen lessons. They're not local — they're all downstream of one
force. The system has a single binding constraint, and an enormous fraction of its design is the economics of spending
that resource carefully. Name the resource and the architecture stops looking arbitrary.
Your anchor: you've felt this constraint
Anyone who's built on-chain knows it — a self-hosted archive node is finite and precious, and Etherscan / Blockscout are
rate-limited and metered. An eth_call or getLogs is orders of magnitude more expensive (latency, rate
limit, infra) than a Memgraph read. So the whole system tilts one way: read the graph, not the chain — and when you
must touch the chain, touch it as little and as cleverly as possible. RPC is the budget everything is drawn against.
1 · The seven moves for spending less RPC
Almost every efficiency mechanism you've seen is one of these seven strategies:
Strategy
How
Seen in
Filter early
the monitored-set SISMEMBER drops ~95% of events before any enrichment; discovery is value-gated (≥$1M, structural deps only) so the crawl can't swallow the chain
L1/L2, L24
Cache & reuse
the balance hot cache, NAV/price stamps recomputed via price-dirty not re-read, block prefetch + 7-day cache
L36, L26, L1
Batch
Multicall3 collapses ~50 eth_calls into 2 aggregate3s; the RPC pool fans out reads concurrently
L25, L26
Sample, don't sweep
chainref verifies a subset per cycle — coverage trends statistically rather than re-reading 1.5M nodes from chain
L29/L34
Work a slice
per-token partial graphs (3.07 GB → 150 MB) load only the spine neighborhood, not the whole graph
L23, L38
Pace it (cadence)
nothing chain-reading runs per-block: at_risk 30 min, refreshers 30–60 min, the slow validator tier hourly vs the fast graph tier every 10 min
L23, L24, L34
Spread & degrade
round-robin across RPC endpoints; every read is best-effort (a failed probe skips, never crashes); OOM → backpressure protects the shared box
L26, L24/L32, L37
The cadence IS a budget decision
Notice how many mechanisms are "run less often." A 30-minute at_risk cycle, a 60-minute LP refresher, an hourly slow
tier — these aren't arbitrary intervals. Each is the answer to "how stale can this data be before correctness suffers?",
set as infrequently as correctness allows, because every cycle costs reads. When you see a periodic loop in this
codebase, read its interval as a price tag.
2 · The big idea: the graph is a cache of the chain
Step back and the whole design is one move: the Memgraph graph is an elaborate, queryable cache of on-chain
state, built so the hot paths read it (cheap) instead of the chain (dear).
Hot paths read the graph. Risk computation, rule evaluation, the read surface, even whole subsystems — the oracle bridger is pure graph, zero RPC (L27) — operate on cached state.
Periodic re-readers are the budgeted re-sync. Refreshers (L24), chainref verifiers (L29), conservation (L32) are exactly the controlled, paced, sampled touches of the source of truth that keep the cache honest — the only things that routinely spend RPC, and all of them rationed.
Ingest fills the cache push-style. Block events flow in once and update the graph; the system then serves from the graph until a refresher decides a value has drifted enough to re-read.
"Read the graph, not the chain" is the reflex
Whenever a computation can be answered from stored graph state, it is. The oracle bridger could have RPC-probed each
market's oracle; instead it derives the dependency from existing edges in pure Cypher. That instinct — prefer the cached
derivation over a fresh read — is the single most repeated efficiency decision in the codebase.
3 · The constraint is also the bill
Here's the satisfying loop back to L35. The cost-allocation model bills customers on four signals — and one of them,
PROTO edges, is explicitly a proxy for "refresh-worker multicall RPC across the structural edge types," while HOLDS
proxies balance lookups. In other words, the scarce resource the architecture is organized around is the very thing the
business charges for. RPC budget isn't just an engineering constraint — it's the unit of value the platform sells.
Spend it well and you both run cheaper and bill more fairly.
Why naming the resource demystifies the design
Once "RPC is the binding constraint" is in your head, the architecture reads as a series of obvious answers: Why
periodic? reads cost money. Why sample? can't afford a full chain sweep. Why a balance cache? don't re-read what
hasn't changed. Why Multicall3? amortize the round-trip. Why pure-graph where possible? the graph is the cache.
Nothing is arbitrary — it's all one economy.
4 · The reflex for new work
This synthesis turns into a habit. Faced with any new feature, the first design question becomes: "what's its RPC
cost, and how do I bound it?" — answered with the seven moves. Need fresh on-chain data? Can you read it from the graph
instead (cache)? If you must call, can you batch (Multicall3), sample, or pace it? Can you bound the set (partial / filter)?
Is the read best-effort so a failure degrades rather than crashes? A feature that ignores the RPC budget is the one that
takes down the archive node — and a reviewer who's internalized this catches it on sight.
Check yourself
1. What is the single binding constraint that most of the architecture is organized around?
2. The monitored-set SISMEMBER filter drops ~95% of events. Which RPC-economy strategy is that?
3. Multicall3 turns ~50 eth_calls per contract into 2 aggregate3s (L25). Why does that matter so much?
4. chainref samples a subset of nodes per cycle rather than re-reading all 1.5M from chain. What's the trade-off it accepts?
5. The lesson frames the Memgraph graph as "a cache of the chain." What does that reframe explain?
6. The oracle bridger derives market→oracle dependencies in pure Cypher rather than RPC-probing each market (L27). Which reflex is that?
7. How does this constraint connect to the cost-allocation model (L35)?
8. You're adding a feature that needs a contract's current owner. What's the RPC-economy-minded first move?
↳ Ask your teacher
Try: "Roughly what's the RPC cost of one enrichment of a fresh contract?" ·
"Where would adding a naive per-block RPC read hurt the most?" ·
"How does the rpcPool decide which endpoint a read goes to?" ·
"Which refresher cadence is the most expensive, and why is it set where it is?" ·
"Is there anywhere the system over-reads and could be tightened?"
What you can now do
Name RPC (archive node + rate-limited explorers) as the binding constraint, and explain why it dwarfs graph-read cost.
Classify any efficiency mechanism into the seven moves: filter / cache / batch / sample / slice / pace / spread-degrade.
Explain "the graph is a cache of the chain," and why hot paths read the graph while paced re-readers touch the chain.
Read a periodic loop's interval as a budget decision, and explain why pure-graph computation is preferred.
Connect the constraint to billing (L35), and apply the "what's its RPC cost, how do I bound it?" reflex to new work.
Four syntheses — the system as a set of disciplines
Combinators (L46, what to combine) · float-determinism (L47, how reproducibly) · idempotency (L48, how to write
safely) · RPC economics (here, what it all costs). Across these four lenses, the codebase's thousands of local
decisions resolve into a handful of coherent principles — which is what "understanding it deeply, end to end" actually
means.