Cross-Cutting Synthesis · Deeper Track

One scarce resource, shaping everything

Why the architecture is the way it is: the economics of RPC. ~13 min.

Synthesizes: L24 · L25 · L29 · L35 Anchor: archive node + explorer rate limits New: cost shapes architecture

"Periodic, not per-block." "Sample, don't sweep." "Multicall3: 50 calls → 2." "Cache the balance." "Pure graph, no RPC." You've met these as local choices in a dozen lessons. They're not local — they're all downstream of one force. The system has a single binding constraint, and an enormous fraction of its design is the economics of spending that resource carefully. Name the resource and the architecture stops looking arbitrary.

Your anchor: you've felt this constraint
Anyone who's built on-chain knows it — a self-hosted archive node is finite and precious, and Etherscan / Blockscout are rate-limited and metered. An eth_call or getLogs is orders of magnitude more expensive (latency, rate limit, infra) than a Memgraph read. So the whole system tilts one way: read the graph, not the chain — and when you must touch the chain, touch it as little and as cleverly as possible. RPC is the budget everything is drawn against.

1 · The seven moves for spending less RPC

Almost every efficiency mechanism you've seen is one of these seven strategies:

StrategyHowSeen in
Filter earlythe monitored-set SISMEMBER drops ~95% of events before any enrichment; discovery is value-gated (≥$1M, structural deps only) so the crawl can't swallow the chainL1/L2, L24
Cache & reusethe balance hot cache, NAV/price stamps recomputed via price-dirty not re-read, block prefetch + 7-day cacheL36, L26, L1
BatchMulticall3 collapses ~50 eth_calls into 2 aggregate3s; the RPC pool fans out reads concurrentlyL25, L26
Sample, don't sweepchainref verifies a subset per cycle — coverage trends statistically rather than re-reading 1.5M nodes from chainL29/L34
Work a sliceper-token partial graphs (3.07 GB → 150 MB) load only the spine neighborhood, not the whole graphL23, L38
Pace it (cadence)nothing chain-reading runs per-block: at_risk 30 min, refreshers 30–60 min, the slow validator tier hourly vs the fast graph tier every 10 minL23, L24, L34
Spread & degraderound-robin across RPC endpoints; every read is best-effort (a failed probe skips, never crashes); OOM → backpressure protects the shared boxL26, L24/L32, L37
The cadence IS a budget decision
Notice how many mechanisms are "run less often." A 30-minute at_risk cycle, a 60-minute LP refresher, an hourly slow tier — these aren't arbitrary intervals. Each is the answer to "how stale can this data be before correctness suffers?", set as infrequently as correctness allows, because every cycle costs reads. When you see a periodic loop in this codebase, read its interval as a price tag.

2 · The big idea: the graph is a cache of the chain

Step back and the whole design is one move: the Memgraph graph is an elaborate, queryable cache of on-chain state, built so the hot paths read it (cheap) instead of the chain (dear).

"Read the graph, not the chain" is the reflex
Whenever a computation can be answered from stored graph state, it is. The oracle bridger could have RPC-probed each market's oracle; instead it derives the dependency from existing edges in pure Cypher. That instinct — prefer the cached derivation over a fresh read — is the single most repeated efficiency decision in the codebase.

3 · The constraint is also the bill

Here's the satisfying loop back to L35. The cost-allocation model bills customers on four signals — and one of them, PROTO edges, is explicitly a proxy for "refresh-worker multicall RPC across the structural edge types," while HOLDS proxies balance lookups. In other words, the scarce resource the architecture is organized around is the very thing the business charges for. RPC budget isn't just an engineering constraint — it's the unit of value the platform sells. Spend it well and you both run cheaper and bill more fairly.

Why naming the resource demystifies the design
Once "RPC is the binding constraint" is in your head, the architecture reads as a series of obvious answers: Why periodic? reads cost money. Why sample? can't afford a full chain sweep. Why a balance cache? don't re-read what hasn't changed. Why Multicall3? amortize the round-trip. Why pure-graph where possible? the graph is the cache. Nothing is arbitrary — it's all one economy.

4 · The reflex for new work

This synthesis turns into a habit. Faced with any new feature, the first design question becomes: "what's its RPC cost, and how do I bound it?" — answered with the seven moves. Need fresh on-chain data? Can you read it from the graph instead (cache)? If you must call, can you batch (Multicall3), sample, or pace it? Can you bound the set (partial / filter)? Is the read best-effort so a failure degrades rather than crashes? A feature that ignores the RPC budget is the one that takes down the archive node — and a reviewer who's internalized this catches it on sight.

Check yourself

1. What is the single binding constraint that most of the architecture is organized around?
2. The monitored-set SISMEMBER filter drops ~95% of events. Which RPC-economy strategy is that?
3. Multicall3 turns ~50 eth_calls per contract into 2 aggregate3s (L25). Why does that matter so much?
4. chainref samples a subset of nodes per cycle rather than re-reading all 1.5M from chain. What's the trade-off it accepts?
5. The lesson frames the Memgraph graph as "a cache of the chain." What does that reframe explain?
6. The oracle bridger derives market→oracle dependencies in pure Cypher rather than RPC-probing each market (L27). Which reflex is that?
7. How does this constraint connect to the cost-allocation model (L35)?
8. You're adding a feature that needs a contract's current owner. What's the RPC-economy-minded first move?
↳ Ask your teacher
Try: "Roughly what's the RPC cost of one enrichment of a fresh contract?" · "Where would adding a naive per-block RPC read hurt the most?" · "How does the rpcPool decide which endpoint a read goes to?" · "Which refresher cadence is the most expensive, and why is it set where it is?" · "Is there anywhere the system over-reads and could be tightened?"

What you can now do

Four syntheses — the system as a set of disciplines
Combinators (L46, what to combine) · float-determinism (L47, how reproducibly) · idempotency (L48, how to write safely) · RPC economics (here, what it all costs). Across these four lenses, the codebase's thousands of local decisions resolve into a handful of coherent principles — which is what "understanding it deeply, end to end" actually means.

Synthesizes code already cited in: monitored-set filter (L1/L2), value-gated discovery (L24), balance/price caches (L36/L26), Multicall3 admin probes (L25), readAllFeeds RPC-pool fan-out (L26), chainref sampling (L29/L34), per-token partial graphs (L23/L38), scheduler/refresher/validator cadences (L23/L24/L34), best-effort reads + OOM→backpressure (L24/L32/L37), pure-graph oracle bridger (L27); the PROTO/HOLDS billing signals (L35). Verify against source — the code is the truth.