Seismic data, ~95% smaller at petabyte scale
Compress SEG-Y surveys for archive, processing, or interactive sub-volume access. Cross-survey deduplication makes the second copy almost free.
Drop a SEG-Y file (up to 200 MB) and see the codec on your own data. No credit card.
The math at 1 petabyte / year
Active O&G surveys routinely produce 1–10 PB per program. Here's what 1 PB of SEG-Y costs to store for a year, before any compression on smallest.zip:
| Option | How it's sold | Typical 1 PB-year cost | Notes |
|---|---|---|---|
| Raw SEG-Y on S3 Standard | $0.023 / GB-mo1 | ~$282,000 / yr | No compression, no dedup, no sub-volume access |
| Raw SEG-Y on S3 Glacier Deep Archive | $0.00099 / GB-mo1 | ~$12,150 / yr | Cold-only, 12+ hr restore, no random access, retrieval fees |
| TerraSpark Compression | Per-survey licence, contact sales2 | ~$$$$ (six figures typical) | Industry standard for active seismic; closed format, per-seat licensing |
| ZGY (Schlumberger) | Bundled with Petrel | Not separately priced | Internal format; not generally available as a standalone codec |
| smallest.zip Seismic Archive | $0.10 / GB processed + $0.005 / GB-mo stored (compressed) | ~$103,000 / yr first survey, dropping fast with dedup | Per-GB SaaS or on-prem enterprise; lossless / lossy / wavelet in one codec |
1 PB ingest @ $0.10 = $100,000 one-time + 1 PB compressed to ~70 TB at $0.005 = ~$4,300 / yr storage. Second similar survey adds only the recipe — at 100:1 dedup the marginal storage cost on a re-archive run is <$50.
1AWS S3 us-east-1 list prices, ignoring egress and retrieval. 2TerraSpark public materials do not list per-survey pricing — figure based on customer reports.
Three modes, one codec
Pick the right trade-off per workflow stage. All three share the same on-disk store, so cross-mode dedup is automatic.
Lossless
Byte-exact roundtrip on int16 SEG-Y (format 3). For the regulatory archive — what the regulator hands back is bit-identical to what you sent. Delta + zstd + content-addressed dedup. Float-format SEG-Y (formats 1, 5) routes through a high-fidelity wavelet path; ask sales about our format-5 native lossless beta.
Lossy
Typical 20:1 single-file ratio at 55+ dB PSNR. For active processing where the seismic interpreter cares about reflectivity, not the last quantization bit. 6/7/8-bit profiles let you tune the curve.
Wavelet
3-D Daubechies-4 DWT with 64×64×N bricks. Random-access sub-volume decode — pull a 64-trace inline strip without unpacking the whole survey. Powers interactive viewers on top of compressed archives.
Cross-survey deduplication — the compound win
SEG-Y archives are full of duplication: regulatory re-submissions, time-lapse acquisitions over unchanged geology, regional overlap zones, dev/test replicas of production data. Our codec hashes each compressed sub-volume; identical content stores once across the whole tenant.
| Scenario (5.8 MB int16 F3 slice, lossless mode) | Recipes total | Store DB | System total | Effective ratio |
|---|---|---|---|---|
| 1 survey (first upload) | 6 KB | 4.0 MB | 4.0 MB | 0.69 (1.4×) |
| 5 re-uploads of the same survey | 31 KB | 4.1 MB | 4.1 MB | 0.14 (7.1× / 86% smaller) |
Verified in our codec audit: each duplicate after the first adds only ~6 KB (the recipe), regardless of file size. At petabyte fleet scale this is where the codec earns its keep.
Honest caveat: dedup triggers on identical compressed sub-volumes — replicas, overlap zones, re-archives, time-lapse over unchanged geology. Two independently-acquired surveys of nominally-similar terrain rarely dedup, because acquisition noise differs at every trace. We don't sell fuzzy dedup; we sell content-addressed dedup that's mathematically exact.
Benchmarks
All numbers from codec-audit/segy/VALIDATION-REPORT.md — reproducible from the audit script.
| Input | Mode | Recipe | Encode time | PSNR | Byte-exact |
|---|---|---|---|---|---|
| F3 slice (5.8 MB, int16, 5000 traces) | Lossless | 6.2 KB | 0.1 s | ∞ | byte-exact |
| F3 slice (5.8 MB, int16, 5000 traces) | Lossy 7-bit | 91 KB | 1.7 s | 55.2 dB | n/a |
| F3 slice (5.8 MB, int16, 5000 traces) | Wavelet (db4, medium) | 6.6 KB | 10.2 s | 53.8 dB | n/a |
| Synthetic seismic (2.0 MB, 900 traces, Ricker + noise) | Wavelet medium | 1.2 KB | 0.6 s | ~52 dB | n/a |
| 5 × F3 slice (29.1 MB total, identical re-uploads) | Lossless + dedup | 31 KB + 4.08 MB store | ~0.5 s | ∞ | byte-exact |
Encode is CPU-intensive on the first survey, especially in wavelet mode (~2 MB/s single-threaded; we shard for higher throughput on enterprise nodes). Decompression is much faster — typical 30–50 MB/s. The win compounds across surveys that share sub-volumes.
See it on your own seismic
Drop a SEG-Y file (up to 200 MB). Pick a mode. Get the recipe + store back as a single bundle.
No signup. Rate-limited to 1 per hour per IP because the encode is heavy. Larger files? Talk to sales.
Frequently asked questions
Which SEG-Y trace formats are supported?
All five: IBM float (format 1), int32 (2), int16 (3), fixed-point (4), and IEEE float (5). Lossless is bit-exact for int16 today; int32 and float formats round-trip through a high-fidelity wavelet path (53–60 dB PSNR). Native float-lossless is in beta — contact sales for early access.
Are headers preserved?
Yes. The 3200-byte EBCDIC textual header and the 400-byte binary file header are stored verbatim. Per-trace 240-byte headers are template-compressed (we extract the constant fields and delta-encode the varying ones), then restored byte-for-byte on decode.
Will it work with Petrel, Kingdom, OpenSeisWorks?
Yes — the output of decompression is a fully-conformant SEG-Y file (rev1 / rev2). You decompress on the way out and feed the resulting .segy to any interpretation package. There is nothing proprietary in the output.
How does wavelet mode enable sub-volume access?
We bricks each survey into 64×64×N cubes and DWT each brick independently. To extract a 64-inline strip, we pull only the bricks that intersect it, inverse-DWT, and return a NumPy array — no need to decompress the whole survey. Powers responsive viewers on top of cold storage.
Can I run this on-prem / air-gapped?
Yes — enterprise tier ships a single binary (or Docker image) you run in your data center. No callbacks, no telemetry, no internet required. Bring your own object store (S3, MinIO, Ceph, Azure Blob, NFS).
How does this compare to TerraSpark?
TerraSpark is the incumbent for active seismic compression — high quality, well-trusted, but priced per-survey with closed format and per-seat licensing. We're priced per-GB processed and per-GB-month stored, with the cross-survey-dedup compound savings as the main differentiator at petabyte scale. The decoder is single-binary, open-format, and yours forever on enterprise.
What about ZGY?
ZGY is great if you're all-in on Petrel, but it's not generally available as a standalone codec — and it's not designed for cross-survey deduplication across a heterogeneous fleet. We complement Petrel rather than replace it: ZGY in the workstation, smallest.zip in the archive.
Encode speed?
Single-thread: lossless ~50 MB/s, lossy ~3 MB/s, wavelet ~2 MB/s. We shard per-brick and per-survey for parallel ingest — typical enterprise box does 1 TB/hour wavelet encode. Decompression is faster: 30–80 MB/s single-thread, much higher in parallel.
Compliance, residency, SLAs?
SOC 2 Type II aligned. TLS 1.3 in transit, AES-256 at rest. EU and US data residency. 99.9% on standard tier, 99.99% with multi-region replication on enterprise. Talk to us for specifics.
What happens if smallest.zip disappears?
The decompressor is a standalone binary; enterprise customers get a perpetual source-available licence to it. Your archive is never trapped — worst case, you spin up the binary, point it at the store and recipes, and pull out original SEG-Y.
Compress a petabyte of seismic for less than the cost of one TerraSpark licence
Drop a real SEG-Y file and see the compression on your own survey. No signup, no credit card.