Blog
Research, benchmarks, and updates from the Smallest.zip team.
Beyond FASTQ: 4BIN Compression for VCF, BAM, POD5, and Spatial Genomics
We expanded our genomics compression beyond FASTQ to four new file types. VCF achieves 385x reduction, BAM saves 84.7%, POD5 hits 2.69 bps, and spatial transcriptomics compresses to 4.73% of original size.
4BIN Now Compresses scRNA, RNA-seq, cfDNA, and WES — With Sub-12s Decompression
4BIN compression expanded beyond amplicon FASTQ to four new sequencing types. scRNA hits 5.87% of original size with 3.1s decompression. Every type decompresses in under 12 seconds.
4BIN vs PetaGene: Beating the Industry Standard in FASTQ Compression
Our 4BIN encoder compresses FASTQ files to 4.5% of original size — 1.1–1.2x better than PetaGene — while preserving 4-level quality scores. All amplicon files tested beat PetaGene.
Compressing Neural Network Weights: 40% Smaller Safetensors
We achieved 40% compression on BF16 safetensor model weights - cutting egress costs for terabyte-scale models by thousands of dollars per month.
Introducing LIS: Backend Large Image Storage That Cuts Cloud Costs by 5,000x
Our new Large Image Storage system compresses a 1 PB image corpus to ~190 GB — reducing S3 costs from $2,300/mo to $4.37/mo while returning images in under 200ms.
Genozip vs 4BIN: FASTQ Compression Benchmark 2026
Head-to-head comparison of Genozip and 4BIN for FASTQ compression. 4BIN achieves 4.5% of original size vs Genozip's ~7%. Benchmarked on real DDBJ sequencing data.
28GB Windows Logs Down to 9.8MB — 94.4% Smaller Than xz
We compressed a 28GB Windows log file with 114.6 million lines down to just 9.8MB — 94.4% smaller than xz -9 at maximum compression.
70% Smaller JPEGs — Still Standard JPEG, Works Everywhere
Our JPEG compressor reduces images by up to 70% while outputting standard JPEG files that work in every browser, phone, and app. Fast mode runs in 29ms per image.
97.6% Smaller Than xz on 11.4GB Windows Security Event Logs
We compressed an 11.4GB Windows security event log (JSONL) down to just 10MB — 97.6% smaller than xz -9. Structured JSON logs are where Smallest.zip truly shines.
99.4% Compression on ZooKeeper Logs — 82% Smaller Than xz
We compressed a 10.4MB ZooKeeper log file down to just 63KB — 81.6% smaller than xz -9 at maximum compression.
HDFS Logs: From 31% to 64% Smaller Than xz — Our V4 Breakthrough
Our V4 token detection system doubled the compression advantage on 1.5GB HDFS logs — now 63.6% smaller than xz -9, and 6x faster.
96% Compression on 1.5GB HDFS Logs — 31% Smaller Than xz
We benchmarked Smallest.zip on a massive 1.5GB HDFS log file with 11 million lines. Result — 63MB output, 30.6% smaller than xz -9.
93% Smaller Than xz on 26GB Windows CBS Logs
We compressed a 26.1GB Windows Component-Based Servicing log file down to 26MB — 93.2% smaller than xz -9 at maximum compression.
Fast Mode: Same Compression Ratios, Massively Faster
Our new fast-mode optimizer compresses log files in under a second — while maintaining or exceeding all previous compression ratios. Here's the full benchmark.
98.3% Compression on Linux Kernel Logs — 65% Smaller Than xz
We tested Smallest.zip against gzip, bzip2, zstd, and xz on a 2.3MB Linux kernel/syslog file. Our encoder compressed it to just 40KB — 65.4% smaller than xz -9.
99.1% Compression on Apache Logs — 75% Smaller Than xz
We tested Smallest.zip on a 4.9MB Apache log file with 56K lines of mixed error, notice, and access logs. Result — 46KB output, 74.6% smaller than xz -9.
Reduce AWS Genomics Storage Costs by 95% with FASTQ Compression
How to cut your AWS S3 genomics storage bill by 95%. Compress FASTQ files from 25% (gzip) to 4.5% of original size with 4BIN. Real cost calculations included.
Crushing Log Files: 98.4% Compression on 70MB SSH Logs
We benchmarked our Smallest.zip encoder against gzip, xz, bzip2, and zstd on a real-world 70MB SSH syslog file. The results speak for themselves — 67% smaller than xz -9.
PetaGene Alternative: 4BIN Compresses FASTQ 1.15x Better
Looking for a PetaGene alternative? 4BIN achieves 4.5% FASTQ compression vs PetaGene's 5.3% — 1.15x better on real sequencing data. Cloud API, no local install required.
How to Compress FASTQ Files for S3 Archival
Step-by-step guide to compressing FASTQ files for long-term S3 storage. Compare gzip, Genozip, PetaGene, and 4BIN compression ratios and costs.
FASTQ vs BAM: Which Is Cheaper to Store?
Comparing FASTQ and BAM file sizes, compression ratios, and cloud storage costs. How to minimize genomics storage spend across both formats.
Geospatial Compression: GeoJSON, Shapefile, LiDAR and GeoTIFF Below 10%
We hit our geospatial compression targets — GeoJSON at 9.85%, Shapefile at 9.82%, LiDAR at 6.99%, and GeoTIFF DEM at 23% — all lossless with bbox query support.