Back to Blog

4BIN Now Compresses scRNA, RNA-seq, cfDNA, and WES — With Sub-12s Decompression

4BIN compression expanded beyond amplicon FASTQ to four new sequencing types. scRNA hits 5.87% of original size with 3.1s decompression. Every type decompresses in under 12 seconds.

What's New

When we launched 4BIN, we benchmarked against PetaGene on amplicon FASTQ files from the DDBJ Sequence Read Archive — and beat them across the board. But amplicon is just one sequencing type. Real-world genomics labs produce a mix of data: single-cell RNA, bulk RNA-seq, cell-free DNA, whole-exome sequencing, and more.

Today we're sharing compression and decompression benchmarks across four additional FASTQ types: scRNA, RNA-seq, cfDNA, and WES.

Results: All FASTQ Types

File Raw Size Compressed Ratio Decompress Time
scRNA 490 MB 30 MB 5.87% 3.1s
RNA-seq 1.04 GB 76 MB 7.29% 8.2s
cfDNA 1.19 GB 113 MB 9.47% 11.7s
WES 1.34 GB 138 MB 10.27% 11.6s

Every file type compresses to under 11% of its original size. Every file decompresses in under 12 seconds.

Decompression Speed

Fast decompression matters. If your bioinformatics pipeline stalls waiting for data to decompress, you've traded storage savings for compute delays.

4BIN decompression runs at 100–160 MB/s of original data throughput:

  • scRNA (490 MB): 3.1 seconds → 158 MB/s
  • RNA-seq (1.04 GB): 8.2 seconds → 130 MB/s
  • cfDNA (1.19 GB): 11.7 seconds → 104 MB/s
  • WES (1.34 GB): 11.6 seconds → 118 MB/s

These speeds mean decompression adds negligible overhead to alignment and variant calling workflows. Combined with FQLink — our transparent decompression wrapper — your existing tools never even see the compressed format.

Compression Ratios by Sequencing Type

Not all FASTQ data compresses equally. The ratio depends on read length, quality score distribution, and sequence complexity:

  • scRNA (5.87%) — Short reads with highly repetitive barcodes and UMIs compress exceptionally well. The barcode structure gives 4BIN strong patterns to exploit.
  • RNA-seq (7.29%) — Longer reads with broader sequence diversity. Still excellent compression thanks to quality score binning.
  • cfDNA (9.47%) — Cell-free DNA fragments are short but with high quality score variance. The 4-level binning strategy keeps the ratio under 10%.
  • WES (10.27%) — Whole-exome captures are the most diverse in this set — targeted enrichment across thousands of exons — yet 4BIN still achieves nearly 10x reduction.

Combined With Previous Amplicon Results

Adding our earlier amplicon benchmarks for the complete picture:

Type Ratio vs PetaGene
Amplicon (DRR000798) 4.56% 1.16x better
Amplicon (DRR000801) 4.77% 1.11x better
Amplicon (DRR000802) 4.47% 1.19x better
scRNA 5.87%
RNA-seq 7.29%
cfDNA 9.47%
WES 10.27%

Amplicon data remains the best-compressing type — all three files beat PetaGene with 4-level quality preserved. The new types show that 4BIN scales across the full spectrum of sequencing workflows.

What This Means for Your Storage Bill

A lab generating 10 TB of mixed FASTQ data per month:

  • Gzip (~25%): 2.5 TB stored → $690/year in S3 costs
  • 4BIN (~7% average): 700 GB stored → $193/year in S3 costs
  • Annual savings: ~$500/year per 10 TB/month — and that's just storage. Egress and transfer savings multiply this further.

At petabyte scale, the savings reach tens of thousands of dollars per year.

Try It

If you're generating any of these data types — scRNA, RNA-seq, cfDNA, WES, or amplicon — reach out and we'll run 4BIN on your actual data. No commitment, no risk to your files. Just smaller FASTQ.