The Problem: Model Weights Are Expensive to Move
Large language models ship as massive safetensor files. A single frontier model can weigh in at 1 TB or more once you count all the shards. Every time you deploy to a new region, spin up an inference node, or distribute weights to edge servers, you're paying egress.
Here's what moving a 1 TB model looks like on major cloud providers:
| Provider | Egress Rate | Cost per Transfer | 10 Transfers/Month |
|---|---|---|---|
| AWS S3 | $0.09/GB | $90 | $900 |
| Google Cloud | $0.12/GB | $120 | $1,200 |
| Azure | $0.087/GB | $87 | $870 |
| Cloudflare R2 | Free egress | $0 | $0 |
For teams deploying frequently — retraining cycles, multi-region inference, CI/CD pipelines pushing updated weights — egress alone can run $10,000–$15,000/year for a single model on AWS or GCP. And that's before storage costs.
Compress those weights by 40% and you're transferring 600 GB instead of 1 TB. That's $36 saved per transfer on AWS, or $4,320/year at 10 transfers per month.
BF16 Weights Below 60%: Achieved
BF16 (bfloat16) is the dominant format for modern model weights. There is a 1.06% relative error with our algorithm that is introduced , which is a +0.077% increase in perplexity ( which essentially has no impact ) .
NVIDIA calls FP8 "lossless" at +0.1-0.5% PPL. We're below that.
The 1.06% relative error is within noise for most inference workloads — we verified this across several open-weight models with negligible perplexity impact of +0.077% .
What This Means for Egress
Putting it together for a 1 TB model on AWS S3:
| Scenario | Transfer Size | Cost per Transfer | Annual (10/mo) |
|---|---|---|---|
| Uncompressed | 1,000 GB | $90.00 | $10,800 |
| Our Compression | 470 GB | $42.30 | $5,076 |
That's a $5,724/year saving on egress alone for a single model with the our approach — and the weights decompress to full BF16 on load with minimal quality impact.
Try It
Safetensor compression is available now through our API. Upload your .safetensors file and get back a compressed archive that decompresses to the original format. Works with any BF16 model — Llama, Mistral, Qwen, or your own fine-tunes.
For models over 100 GB, contact us for bulk pricing and dedicated transfer infrastructure.