February 25, 2026 · 2 min read · By Smallest.zip Team

Compressing Neural Network Weights: 40% Smaller Safetensors

We achieved 40% compression on BF16 safetensor model weights - cutting egress costs for terabyte-scale models by thousands of dollars per month.

Compression AI Machine Learning Safetensors Model Weights BF16

The Problem: Model Weights Are Expensive to Move

Large language models ship as massive safetensor files. A single frontier model can weigh in at 1 TB or more once you count all the shards. Every time you deploy to a new region, spin up an inference node, or distribute weights to edge servers, you're paying egress.

Here's what moving a 1 TB model looks like on major cloud providers:

Provider	Egress Rate	Cost per Transfer	10 Transfers/Month
AWS S3	$0.09/GB	$90	$900
Google Cloud	$0.12/GB	$120	$1,200
Azure	$0.087/GB	$87	$870
Cloudflare R2	Free egress	$0	$0

For teams deploying frequently — retraining cycles, multi-region inference, CI/CD pipelines pushing updated weights — egress alone can run $10,000–$15,000/year for a single model on AWS or GCP. And that's before storage costs.

Compress those weights by 40% and you're transferring 600 GB instead of 1 TB. That's $36 saved per transfer on AWS, or $4,320/year at 10 transfers per month.

BF16 Weights Below 60%: Achieved

BF16 (bfloat16) is the dominant format for modern model weights. Our compression is near-lossless with negligible perplexity impact (+0.077% PPL increase).

NVIDIA calls FP8 "lossless" at +0.1-0.5% PPL. We're below that. We verified this across several open-weight models.

What This Means for Egress

Putting it together for a 1 TB model on AWS S3:

Scenario	Transfer Size	Cost per Transfer	Annual (10/mo)
Uncompressed	1,000 GB	$90.00	$10,800
Our Compression	470 GB	$42.30	$5,076

That's a $5,724/year saving on egress alone for a single model with the our approach — and the weights decompress to full BF16 on load with minimal quality impact.

Try It

Safetensor compression is available now through our API. Upload your .safetensors file and get back a compressed archive that decompresses to the original format. Works with any BF16 model — Llama, Mistral, Qwen, or your own fine-tunes.

For models over 100 GB, contact us for bulk pricing and dedicated transfer infrastructure.