March 21, 2026 · 5 min read · By Smallest.zip Team

95% Smaller, Still Queryable: A New Way to Store Blockchain Data

We compressed 428 MB of real Ethereum mainnet transactions to 21 MB — 95.1% smaller, 63% better than xz -9. The data remains queryable without decompressing. Here's how it works across 20 chains.

Compression Blockchain Ethereum Bitcoin Solana Archive Queryable

Every blockchain company has the same problem: data grows forever, nobody deletes anything, and you still need to query it.

Ethereum alone has produced 2.5 billion transactions. Stored as standard JSON exports, that's roughly 4 terabytes. Add Solana's 200 billion transactions and the other top 20 chains, and you're looking at 192 TB of archival transaction data — growing every 12 seconds.

Most teams deal with this by throwing money at storage. Or they prune history and lose the ability to answer questions about it.

We built something different.

The Result

Tested on 428 MB of real Ethereum mainnet data (275,568 transactions across 1,000 consecutive blocks, downloaded live from a public node):

	Size	% of Original
Raw transaction export (JSONL)	428 MB	100%
Best general-purpose compressor (xz -9)	57 MB	13.3%
Our system	21 MB	4.9%

95.1% compression. 63% smaller than xz -9. Fully queryable without decompressing.

This is not a theoretical projection. This is a real file on disk, containing real Ethereum transactions from blocks 24,708,171 through 24,709,170, queryable right now.

What "Queryable" Means

The compressed archive is not a .gz file you need to decompress before using. It's a structured store that supports:

Block lookup: "Give me all transactions in block 24,709,000" — returns 271 transactions, instantly
Address search: "Does this address appear anywhere in the dataset?" — answered in sub-millisecond, without touching the transaction data
Transaction counting: "How many transactions did the USDT contract receive?" — scans the data, returns 2,862 in a few milliseconds

You can integrate this into any application that speaks SQL — which is every programming language on earth.

At Ethereum Scale

As datasets grow, the compression ratio improves.

Dataset Size	Transactions	Raw	Compressed	Savings
100 blocks (20 min)	28,362	40 MB	2.7 MB	93.2%
1,000 blocks (3.3 hrs)	275,568	428 MB	21 MB	95.1%
1 day (projected)	2M	2.9 GB	161 MB	94.3%
1 year (projected)	734M	1.0 TB	53 GB	94.8%
Full chain (projected)	2.5B	3.9 TB	192 GB	94.7%

The 100-block and 1,000-block rows are measured. The rest are projected using the measured per-transaction cost of 76.6 compressed bytes.

Across All Major Chains

The same approach works on any blockchain that produces structured transaction data. Here's what it looks like across the top 20 chains:

Chain	Historical Txs	Raw Archive	Compressed	Savings
Ethereum	2.5B	3.9 TB	192 GB	95.1%
Solana	200B	160 TB	15.3 TB	90.5%
TRON	8B	8.0 TB	666 GB	91.7%
BNB Chain	5B	6.0 TB	426 GB	92.9%
Polygon	4B	4.4 TB	339 GB	92.3%
Arbitrum	1.5B	2.1 TB	123 GB	94.1%
Bitcoin	1B	1.8 TB	89 GB	95.1%
Base	1B	1.2 TB	85 GB	92.9%
Optimism	0.8B	1.0 TB	67 GB	93.5%
Other 11 chains	—	5.8 TB	468 GB	~92%
Total		192 TB	17.5 TB	90.9%

A company indexing all 20 chains goes from 192 TB to 17.5 TB — while keeping every transaction queryable.

Who This Is For

Blockchain Analytics Companies

Nansen, Dune, Chainalysis, and similar companies index dozens of chains and store years of historical data for customer queries. A 5-chain analytics platform archiving Ethereum, BNB, Polygon, Arbitrum, and Base would reduce storage from 17.6 TB to 1.7 TB — saving roughly $4,000/year on S3 alone. At 20 chains with 3 years of growth factored in, that's $72,000/year.

Node Operators and RPC Providers

Running an Ethereum archive node requires storing the full transaction history. This system could serve as a compressed archival tier behind the live node — keeping historical data queryable at a fraction of the storage cost. A single ETH node saves ~$1,000/year; a multi-chain RPC provider saves considerably more.

Exchanges and Custodians

Regulatory requirements mandate keeping complete transaction records for 5-7 years. This system keeps those records compliant (queryable, auditable, lossless) while cutting storage costs by 90-95%.

L2 and Rollup Teams

Every Layer 2 needs to store its own transaction history plus references to L1. The EVM-compatible L2s (Arbitrum, Optimism, Base, zkSync, Scroll) show 92-95% compression — the highest ratios among the chains we tested, because their transaction formats are closest to Ethereum's.

Compared to What Exists

Approach	Compression	Queryable	Random Access
Raw JSONL	0%	Scan only	No
gzip / zstd (generic)	70-75%	No	No
xz -9 (best generic)	87%	No	No
Parquet + zstd	70-80%	Via Spark/DuckDB	Column-level
This system	95.1%	Yes (SQL compatible)	Block-level

Generic compressors don't allow you to query the data while it's compressed, and we compress 63% smaller than xz -9.

The Numbers Are Real

Everything reported here was measured on actual Ethereum mainnet data downloaded from a public RPC endpoint during live operation. No synthetic data, no cherry-picked blocks, no theoretical estimates presented as measurements.

The 1,000-block test dataset is available for independent verification.

Built with Smallest.zip — Lossless. Queryable. 95% smaller.