March 04, 2026 · 3 min read · By Smallest.zip Team

4BIN Now Compresses scRNA, RNA-seq, cfDNA, and WES — With Sub-12s Decompression

4BIN compression expanded beyond amplicon FASTQ to four new sequencing types. scRNA hits 5.87% of original size with 3.1s decompression. Every type decompresses in under 12 seconds.

Compression Genomics DNA FASTQ Benchmarks Performance

What's New

When we launched 4BIN, we benchmarked against PetaGene on amplicon FASTQ files from the DDBJ Sequence Read Archive — and beat them across the board. But amplicon is just one sequencing type. Real-world genomics labs produce a mix of data: single-cell RNA, bulk RNA-seq, cell-free DNA, whole-exome sequencing, and more.

Today we're sharing compression and decompression benchmarks across four additional FASTQ types: scRNA, RNA-seq, cfDNA, and WES.

Results: All FASTQ Types

File	Raw Size	Compressed	Ratio	Decompress Time
scRNA	490 MB	30 MB	5.87%	3.1s
RNA-seq	1.04 GB	76 MB	7.29%	8.2s
cfDNA	1.19 GB	113 MB	9.47%	11.7s
WES	1.34 GB	138 MB	10.27%	11.6s

Every file type compresses to under 11% of its original size. Every file decompresses in under 12 seconds.

Decompression Speed

Fast decompression matters. If your bioinformatics pipeline stalls waiting for data to decompress, you've traded storage savings for compute delays.

4BIN decompression runs at 100–160 MB/s of original data throughput:

scRNA (490 MB): 3.1 seconds → 158 MB/s
RNA-seq (1.04 GB): 8.2 seconds → 130 MB/s
cfDNA (1.19 GB): 11.7 seconds → 104 MB/s
WES (1.34 GB): 11.6 seconds → 118 MB/s

These speeds mean decompression adds negligible overhead to alignment and variant calling workflows. Combined with FQLink — our transparent decompression wrapper — your existing tools never even see the compressed format.

Compression Ratios by Sequencing Type

Not all FASTQ data compresses equally. The ratio depends on read length, quality score distribution, and sequence complexity:

scRNA (5.87%) — Short reads compress exceptionally well due to their structural regularity.
RNA-seq (7.29%) — Longer reads with broader sequence diversity. Still excellent compression.
cfDNA (9.47%) — Cell-free DNA fragments are short but with high quality score variance. 4BIN still keeps the ratio under 10%.
WES (10.27%) — Whole-exome captures are the most diverse in this set — targeted enrichment across thousands of exons — yet 4BIN still achieves nearly 10x reduction.

Combined With Previous Amplicon Results

Adding our earlier amplicon benchmarks for the complete picture:

Type	Ratio	vs PetaGene
Amplicon (DRR000798)	4.56%	1.16x better
Amplicon (DRR000801)	4.77%	1.11x better
Amplicon (DRR000802)	4.47%	1.19x better
scRNA	5.87%	—
RNA-seq	7.29%	—
cfDNA	9.47%	—
WES	10.27%	—

Amplicon data remains the best-compressing type — all three files beat PetaGene. The new types show that 4BIN scales across the full spectrum of sequencing workflows.

What This Means for Your Storage Bill

A lab generating 10 TB of mixed FASTQ data per month:

Gzip (~25%): 2.5 TB stored → $690/year in S3 costs
4BIN (~7% average): 700 GB stored → $193/year in S3 costs
Annual savings: ~$500/year per 10 TB/month — and that's just storage. Egress and transfer savings multiply this further.

At petabyte scale, the savings reach tens of thousands of dollars per year.

Try It

If you're generating any of these data types — scRNA, RNA-seq, cfDNA, WES, or amplicon — reach out and we'll run 4BIN on your actual data. No commitment, no risk to your files. Just smaller FASTQ.