AWS Genomics Storage Is Expensive
If you run a sequencing facility, biobank, or clinical genomics pipeline on AWS, you already know: storage costs add up fast.
A typical whole-genome sequencing run produces 100–300 GB of raw FASTQ data. At scale, organizations accumulate petabytes. And AWS S3 charges $0.023 per GB per month — that's $23,000/month per petabyte of raw data.
Most teams compress with gzip, which brings FASTQ down to ~25% of original size. Better, but still $5,750/month per petabyte of original data. For a mid-size biotech with 5 PB of sequencing archives, that's $345,000/year just in S3 storage fees.
The Fix: Compress FASTQ to 4.5%
Our 4BIN encoder compresses FASTQ files to 4.5% of original size — fully lossless. Every base call, quality score, and read header decompresses to the exact original.
Here's what that looks like on your AWS bill:
| Archive Size (Raw) | gzip (~25%) | 4BIN (4.5%) | Annual Savings |
|---|---|---|---|
| 100 TB | $6,900/yr | $1,242/yr | $5,658 |
| 1 PB | $69,000/yr | $12,420/yr | $56,580 |
| 5 PB | $345,000/yr | $62,100/yr | $282,900 |
| 10 PB | $690,000/yr | $124,200/yr | $565,800 |
At 5 PB, switching from gzip to 4BIN saves $282,900 per year. That's real money back in your research budget.
It Also Cuts Egress and Transfer Costs
Smaller files mean less data to move. When you transfer a 200 GB genome between regions, S3-to-S3 transfer at $0.02/GB costs $4.00. At 4.5%, that same genome is 9 GB — costing $0.18.
For pipelines that regularly move data between S3 buckets, regions, or to on-prem compute clusters, the egress savings compound:
| Scenario | Uncompressed | gzip | 4BIN |
|---|---|---|---|
| 1 genome transfer | $4.00 | $1.00 | $0.18 |
| 1,000 genomes/month | $4,000 | $1,000 | $180 |
| Annual egress | $48,000 | $12,000 | $2,160 |
Works With Your Existing AWS Setup
4BIN integrates via REST API. Your existing pipeline:
- Sequencer produces FASTQ → uploaded to S3
- New step: Call 4BIN API to compress → store compressed file in S3
- When needed, decompress via API → feed into your analysis pipeline
No changes to your downstream tools. BWA, STAR, Salmon, and every other aligner/quantifier sees the exact same FASTQ it always did.
HIPAA and Clinical Data
4BIN compression is fully lossless — the decompressed output is bit-identical to the original. For HIPAA-regulated clinical sequencing data, this means:
- No data modification or loss during storage
- Full audit trail via S3 versioning
- Data stays in your AWS account — our API processes and returns, never stores
Get Started
Sign up for free API access and test on your own FASTQ data. Or contact us for enterprise volume pricing.