Back to Blog

96% Compression on 1.5GB HDFS Logs — 31% Smaller Than xz

We benchmarked Smallest.zip on a massive 1.5GB HDFS log file with 11 million lines. Result — 63MB output, 30.6% smaller than xz -9.

The Test

We use xz -9 as our baseline — it's widely regarded as the strongest general-purpose compressor available and the standard benchmark for maximum compression.

We took a 1.5GB HDFS log file — 11 million lines of Hadoop Distributed File System output — and ran it through Smallest.zip alongside xz at maximum compression. This is the largest log file we've benchmarked so far.

Results

Compressor Size vs xz -9
xz -9 91.0 MB baseline
Smallest.zip 63.2 MB -30.6%

Smallest.zip compresses the HDFS log to 63MB — that's 96% smaller than the original 1.5GB file, and 30.6% smaller than xz at maximum compression.

Key Takeaways

At 1.5GB, this is a real stress test for any compressor. HDFS logs are highly repetitive — block reports, replication events, and DataNode status messages — but the sheer volume means even small per-line improvements compound into significant savings.

xz at maximum compression already does well on this data, bringing it down to 91MB. Smallest.zip pushes nearly 28MB further, saving an additional 30% on top of what's already an excellent compression ratio.

For organizations running Hadoop clusters, log storage adds up fast. Compressing with Smallest.zip means storing 63MB instead of 91MB per 1.5GB of logs — a meaningful reduction at scale.

Try It Yourself

Upload your own files at smallest.zip and see the difference. Every account starts with free credits.