SEG-Y Codec — Lossless · Lossy · Wavelet

Seismic data, ~95% smaller at petabyte scale

Compress SEG-Y surveys for archive, processing, or interactive sub-volume access. Cross-survey deduplication makes the second copy almost free.

Drop a SEG-Y file (up to 200 MB) and see the codec on your own data. No credit card.

The math at 1 petabyte / year

Active O&G surveys routinely produce 1–10 PB per program. Here's what 1 PB of SEG-Y costs to store for a year, before any compression on smallest.zip:

Option How it's sold Typical 1 PB-year cost Notes
Raw SEG-Y on S3 Standard $0.023 / GB-mo1 ~$282,000 / yr No compression, no dedup, no sub-volume access
Raw SEG-Y on S3 Glacier Deep Archive $0.00099 / GB-mo1 ~$12,150 / yr Cold-only, 12+ hr restore, no random access, retrieval fees
TerraSpark Compression Per-survey licence, contact sales2 ~$$$$ (six figures typical) Industry standard for active seismic; closed format, per-seat licensing
ZGY (Schlumberger) Bundled with Petrel Not separately priced Internal format; not generally available as a standalone codec
smallest.zip Seismic Archive $0.10 / GB processed + $0.005 / GB-mo stored (compressed) ~$103,000 / yr first survey, dropping fast with dedup Per-GB SaaS or on-prem enterprise; lossless / lossy / wavelet in one codec

1 PB ingest @ $0.10 = $100,000 one-time + 1 PB compressed to ~70 TB at $0.005 = ~$4,300 / yr storage. Second similar survey adds only the recipe — at 100:1 dedup the marginal storage cost on a re-archive run is <$50.
1AWS S3 us-east-1 list prices, ignoring egress and retrieval. 2TerraSpark public materials do not list per-survey pricing — figure based on customer reports.

Three modes, one codec

Pick the right trade-off per workflow stage. All three share the same on-disk store, so cross-mode dedup is automatic.

L

Lossless

Byte-exact roundtrip on int16 SEG-Y (format 3). For the regulatory archive — what the regulator hands back is bit-identical to what you sent. Delta + zstd + content-addressed dedup. Float-format SEG-Y (formats 1, 5) routes through a high-fidelity wavelet path; ask sales about our format-5 native lossless beta.

Q

Lossy

Typical 20:1 single-file ratio at 55+ dB PSNR. For active processing where the seismic interpreter cares about reflectivity, not the last quantization bit. 6/7/8-bit profiles let you tune the curve.

W

Wavelet

3-D Daubechies-4 DWT with 64×64×N bricks. Random-access sub-volume decode — pull a 64-trace inline strip without unpacking the whole survey. Powers interactive viewers on top of compressed archives.

Cross-survey deduplication — the compound win

SEG-Y archives are full of duplication: regulatory re-submissions, time-lapse acquisitions over unchanged geology, regional overlap zones, dev/test replicas of production data. Our codec hashes each compressed sub-volume; identical content stores once across the whole tenant.

Scenario (5.8 MB int16 F3 slice, lossless mode) Recipes total Store DB System total Effective ratio
1 survey (first upload)6 KB4.0 MB4.0 MB0.69 (1.4×)
5 re-uploads of the same survey31 KB4.1 MB4.1 MB0.14 (7.1× / 86% smaller)

Verified in our codec audit: each duplicate after the first adds only ~6 KB (the recipe), regardless of file size. At petabyte fleet scale this is where the codec earns its keep.

Honest caveat: dedup triggers on identical compressed sub-volumes — replicas, overlap zones, re-archives, time-lapse over unchanged geology. Two independently-acquired surveys of nominally-similar terrain rarely dedup, because acquisition noise differs at every trace. We don't sell fuzzy dedup; we sell content-addressed dedup that's mathematically exact.

Benchmarks

All numbers from codec-audit/segy/VALIDATION-REPORT.md — reproducible from the audit script.

Input Mode Recipe Encode time PSNR Byte-exact
F3 slice (5.8 MB, int16, 5000 traces) Lossless 6.2 KB0.1 s byte-exact
F3 slice (5.8 MB, int16, 5000 traces) Lossy 7-bit 91 KB1.7 s55.2 dB n/a
F3 slice (5.8 MB, int16, 5000 traces) Wavelet (db4, medium) 6.6 KB10.2 s53.8 dB n/a
Synthetic seismic (2.0 MB, 900 traces, Ricker + noise) Wavelet medium 1.2 KB0.6 s~52 dB n/a
5 × F3 slice (29.1 MB total, identical re-uploads) Lossless + dedup 31 KB + 4.08 MB store ~0.5 s byte-exact

Encode is CPU-intensive on the first survey, especially in wavelet mode (~2 MB/s single-threaded; we shard for higher throughput on enterprise nodes). Decompression is much faster — typical 30–50 MB/s. The win compounds across surveys that share sub-volumes.

See it on your own seismic

Drop a SEG-Y file (up to 200 MB). Pick a mode. Get the recipe + store back as a single bundle.

Compress a SEG-Y file →

No signup. Rate-limited to 1 per hour per IP because the encode is heavy. Larger files? Talk to sales.

Frequently asked questions

Which SEG-Y trace formats are supported?

All five: IBM float (format 1), int32 (2), int16 (3), fixed-point (4), and IEEE float (5). Lossless is bit-exact for int16 today; int32 and float formats round-trip through a high-fidelity wavelet path (53–60 dB PSNR). Native float-lossless is in beta — contact sales for early access.

Are headers preserved?

Yes. The 3200-byte EBCDIC textual header and the 400-byte binary file header are stored verbatim. Per-trace 240-byte headers are template-compressed (we extract the constant fields and delta-encode the varying ones), then restored byte-for-byte on decode.

Will it work with Petrel, Kingdom, OpenSeisWorks?

Yes — the output of decompression is a fully-conformant SEG-Y file (rev1 / rev2). You decompress on the way out and feed the resulting .segy to any interpretation package. There is nothing proprietary in the output.

How does wavelet mode enable sub-volume access?

We bricks each survey into 64×64×N cubes and DWT each brick independently. To extract a 64-inline strip, we pull only the bricks that intersect it, inverse-DWT, and return a NumPy array — no need to decompress the whole survey. Powers responsive viewers on top of cold storage.

Can I run this on-prem / air-gapped?

Yes — enterprise tier ships a single binary (or Docker image) you run in your data center. No callbacks, no telemetry, no internet required. Bring your own object store (S3, MinIO, Ceph, Azure Blob, NFS).

How does this compare to TerraSpark?

TerraSpark is the incumbent for active seismic compression — high quality, well-trusted, but priced per-survey with closed format and per-seat licensing. We're priced per-GB processed and per-GB-month stored, with the cross-survey-dedup compound savings as the main differentiator at petabyte scale. The decoder is single-binary, open-format, and yours forever on enterprise.

What about ZGY?

ZGY is great if you're all-in on Petrel, but it's not generally available as a standalone codec — and it's not designed for cross-survey deduplication across a heterogeneous fleet. We complement Petrel rather than replace it: ZGY in the workstation, smallest.zip in the archive.

Encode speed?

Single-thread: lossless ~50 MB/s, lossy ~3 MB/s, wavelet ~2 MB/s. We shard per-brick and per-survey for parallel ingest — typical enterprise box does 1 TB/hour wavelet encode. Decompression is faster: 30–80 MB/s single-thread, much higher in parallel.

Compliance, residency, SLAs?

SOC 2 Type II aligned. TLS 1.3 in transit, AES-256 at rest. EU and US data residency. 99.9% on standard tier, 99.99% with multi-region replication on enterprise. Talk to us for specifics.

What happens if smallest.zip disappears?

The decompressor is a standalone binary; enterprise customers get a perpetual source-available licence to it. Your archive is never trapped — worst case, you spin up the binary, point it at the store and recipes, and pull out original SEG-Y.

Compress a petabyte of seismic for less than the cost of one TerraSpark licence

Drop a real SEG-Y file and see the compression on your own survey. No signup, no credit card.

Try it free, no signup Talk to sales