The Test
We use xz -9 as our baseline — it's widely regarded as the strongest general-purpose compressor available and the standard benchmark for maximum compression.
We took a 1.5GB HDFS log file — 11 million lines of Hadoop Distributed File System output — and ran it through Smallest.zip alongside xz at maximum compression. This is the largest log file we've benchmarked so far.
Results
| Compressor | Size | vs xz -9 |
|---|---|---|
| xz -9 | 91.0 MB | baseline |
| Smallest.zip | 63.2 MB | -30.6% |
Smallest.zip compresses the HDFS log to 63MB — that's 96% smaller than the original 1.5GB file, and 30.6% smaller than xz at maximum compression.
Key Takeaways
At 1.5GB, this is a real stress test for any compressor. HDFS logs are highly repetitive — block reports, replication events, and DataNode status messages — but the sheer volume means even small per-line improvements compound into significant savings.
xz at maximum compression already does well on this data, bringing it down to 91MB. Smallest.zip pushes nearly 28MB further, saving an additional 30% on top of what's already an excellent compression ratio.
For organizations running Hadoop clusters, log storage adds up fast. Compressing with Smallest.zip means storing 63MB instead of 91MB per 1.5GB of logs — a meaningful reduction at scale.
Try It Yourself
Upload your own files at smallest.zip and see the difference. Every account starts with free credits.