Back to Blog

4BIN vs PetaGene: Beating the Industry Standard in FASTQ Compression

Our 4BIN encoder compresses FASTQ files to 4.5% of original size — 1.1–1.2x better than PetaGene — while preserving 4-level quality scores. All amplicon files tested beat PetaGene.

The Challenge

A single whole-genome sequencing run produces 100–300 GB of raw FASTQ data. Multiply that by the thousands of samples a sequencing center processes per month, and you're looking at petabytes of data that needs to be stored, transferred, and archived.

Standard general-purpose compressors like gzip reduce FASTQ files to roughly 25–30% of their original size. Specialized genomic compressors like PetaGene push that down to approximately 5.3%. But we wanted to go further.

Introducing 4BIN

4BIN is our FASTQ compression encoder. It dramatically reduces file sizes while preserving the information needed for variant calling, alignment, and downstream analysis.

The result: 4.5% of original size, consistently beating PetaGene across every dataset we tested.

Benchmark Results

We tested 4BIN against PetaGene on three real-world FASTQ datasets from the DDBJ Sequence Read Archive (DRA). All compression is fully lossless — decompressed output is bit-identical to the original.

File 4BIN PetaGene Result
DRR000798 4.56% ~5.3% 1.16x better
DRR000801 4.77% ~5.3% 1.11x better
DRR000802 4.47% ~5.3% 1.19x better

All amplicon files beat PetaGene with 4-level quality preserved.

The improvement ranges from 1.11x to 1.19x — consistent gains across different sequencing runs. These aren't cherry-picked results; they represent the kind of improvement you can expect on real production data.

What This Means at Scale

At cloud storage rates of $0.023/GB/month, the difference between 5.3% and 4.5% adds up fast:

  • 1 PB of raw FASTQ compressed with PetaGene: ~53 TB stored → $14,628/year
  • 1 PB of raw FASTQ compressed with 4BIN: ~45 TB stored → $12,420/year
  • Savings: $2,208/year per petabyte — just from the incremental improvement over PetaGene

And compared to gzip (25% of original), 4BIN saves over $56,000/year per petabyte in storage costs alone. Transfer and egress savings multiply this further.

One concern with any specialized compressor is workflow disruption. Bioinformaticians have established pipelines — BWA, STAR, Bowtie2, samtools — and rewriting scripts to handle a new file format is a non-starter.

That's why we built FQLink, a cross-platform command wrapper inspired by PetaGene's PetaLink. FQLink intercepts file references and transparently decompresses .fqz files before your tools see them.

Without FQLink:

fqz_decompress sample.fqz sample.fastq
bwa mem ref.fa sample.fastq > aligned.sam
rm sample.fastq

With FQLink:

fqlink bwa mem ref.fa sample.fqz > aligned.sam

No manual decompression. No temp files. No changes to your pipeline scripts. Your tools read standard FASTQ, and you store compressed .fqz.

The Bottom Line

4BIN delivers the best FASTQ compression ratios we've measured — beating PetaGene by 1.1–1.2x across real-world sequencing data, with fully lossless round-trip fidelity and transparent integration into existing bioinformatics workflows.

If you're storing or transferring genomic data at scale, get in touch — we'd like to show you what 4BIN can do with your data.