Back to Blog

FASTQ vs BAM: Which Is Cheaper to Store?

Comparing FASTQ and BAM file sizes, compression ratios, and cloud storage costs. How to minimize genomics storage spend across both formats.

FASTQ vs BAM: The Storage Trade-Off

Every sequencing pipeline produces both FASTQ (raw reads) and BAM (aligned reads). Most organizations keep both — FASTQ for reprocessing and BAM for downstream analysis. But which one costs more to store, and how much can you save by compressing each?

File Size Comparison

For a typical 30x whole-genome sample:

Format Typical Size What It Contains
FASTQ (paired) 200 GB Raw reads + quality scores
BAM (aligned) 120 GB Aligned reads + CIGAR + metadata
CRAM 3.1 ~50 GB Reference-based BAM compression

FASTQ is larger because it stores redundant quality score data. BAM is smaller but contains alignment information. Many labs store both, totaling 320 GB per sample.

Compression Ratios

Here's how our compressor handles each format:

Format Raw gzip Our Compression Ratio
FASTQ 200 GB 50 GB 9 GB 4.5%
BAM 120 GB 17 GB 14.3%
Total 320 GB 170 GB 26 GB 8.1%

From 320 GB down to 26 GB per sample — a 92% reduction.

Cost at Scale

For 10,000 whole-genome samples on S3:

Storage Strategy Total Size Annual S3 Cost
FASTQ + BAM (uncompressed) 3.2 PB $883,200
FASTQ (gzip) + BAM 1.7 PB $469,200
FASTQ (4BIN) + BAM (compressed) 260 TB $71,760

That's $811,440/year saved by compressing both formats vs storing uncompressed.

Should You Keep Both?

Keep FASTQ if:

  • You may need to re-align with a newer reference genome
  • Your pipeline requires raw reads for custom processing
  • Regulatory requirements mandate raw data retention

Keep only BAM if:

  • Your alignment is final and won't be re-run
  • You need fast access to aligned reads for variant calling
  • Storage is your primary bottleneck

Our recommendation: Keep both, compressed. At 26 GB per sample (compressed FASTQ + BAM), you can store 10,000 genomes for $71,760/year on S3 — less than what most labs pay for uncompressed BAM alone.

Getting Started

Compress both FASTQ and BAM files through our API. Both formats are fully lossless — the decompressed output is bit-identical to the original.

Sign up for free or contact us for enterprise volumes.