Log Pipeline Integrations

Connect smallest.zip to your existing log pipeline (CloudWatch, Splunk, Datadog, Fluentd, Vector, Logstash, raw syslog) in under 5 minutes. Stop paying ingestion rates for cold storage.

Sign in to see your actual API key in the code examples below.

Overview

smallest.zip's log codec is byte-exact lossless and accepts any text-ish input — JSON lines, syslog, Apache/nginx access logs, application logs, audit trails, anything. The output is a compact .sbz blob (typically 10–25× smaller than gzip on real log corpora) that you can park in S3, GCS, R2, Azure Blob, or anywhere cheap.

This page shows the exact configuration needed to wire smallest.zip into the seven most common log pipelines. Every snippet on this page is copy-pastable.

The endpoint

POST https://smallest.zip/api/log-files/upload
Header / FieldValueNotes
x-api-keyYour API keyRequired. Get one on Settings.
Content-Typemultipart/form-dataFile goes in the file field.
?level=fast / balanced / maxOptional. Default balanced.

Expected response (201 Created)

json
{
  "uploadId": "9f3c2a01",
  "originalFilename": "app-2026-06-01.log",
  "originalBytes": 104857600,
  "compressedBytes": 4823104,
  "codec": "log-files",
  "compressionTimeMs": 2840,
  "createdAt": "2026-06-02T10:14:00Z"
}

Verify your key works right now

bash
echo "$(date) hello from smallest.zip" > /tmp/test.log
curl -X POST "https://smallest.zip/api/log-files/upload?level=balanced" \
  -H "x-api-key: YOUR_API_KEY" \
  -F "file=@/tmp/test.log"

The smallest-logs CLI

Some integrations below shell out to the official CLI. Install it on any Linux/macOS host:

bash
curl -fsSL https://smallest.zip/install/logs | sh
export SMALLEST_API_KEY="YOUR_API_KEY"

# Compress a file in place (writes app.log.sbz, removes original with --rm)
smallest-logs compress app.log

# Stream stdin (good for piping from journalctl, kafka, etc.)
journalctl --since yesterday | smallest-logs compress - --out yesterday.sbz

A) AWS CloudWatch Logs → smallest.zip

CloudWatch Logs storage costs $0.03/GB-month and ingestion is $0.50/GB. Replacing CloudWatch's own archive with a compressed S3 archive cuts the storage bill by ~95% on typical app log corpora.

Step 1 — Create the forwarding Lambda

This Lambda receives a CloudWatch Logs subscription event, decodes the gzipped payload, posts it to smallest.zip, and writes the returned .sbz to S3.

python — lambda_function.py
import base64, gzip, json, os, time, urllib.request, uuid
import boto3

API_KEY = os.environ["SMALLEST_API_KEY"]
BUCKET  = os.environ["ARCHIVE_BUCKET"]
PREFIX  = os.environ.get("ARCHIVE_PREFIX", "cloudwatch/")
ENDPOINT = "https://smallest.zip/api/log-files/upload?level=max"

s3 = boto3.client("s3")

def handler(event, _ctx):
    # CloudWatch subscription payload is base64-gzipped JSON
    payload = json.loads(gzip.decompress(base64.b64decode(event["awslogs"]["data"])))
    lines = "\n".join(e["message"] for e in payload["logEvents"]).encode()

    boundary = uuid.uuid4().hex
    body = (
        f"--{boundary}\r\n"
        f'Content-Disposition: form-data; name="file"; filename="batch.log"\r\n'
        f"Content-Type: text/plain\r\n\r\n"
    ).encode() + lines + f"\r\n--{boundary}--\r\n".encode()

    req = urllib.request.Request(ENDPOINT, data=body, method="POST", headers={
        "x-api-key": API_KEY,
        "Content-Type": f"multipart/form-data; boundary={boundary}",
    })
    with urllib.request.urlopen(req, timeout=60) as r:
        meta = json.loads(r.read())

    # Download the compressed blob and stash it in our own S3 bucket
    dl = urllib.request.Request(
        f"https://smallest.zip/api/files/{meta['uploadId']}/download",
        headers={"x-api-key": API_KEY})
    with urllib.request.urlopen(dl, timeout=60) as r:
        blob = r.read()

    key = f"{PREFIX}{payload['logGroup']}/{int(time.time())}-{meta['uploadId']}.sbz"
    s3.put_object(Bucket=BUCKET, Key=key, Body=blob,
                  Metadata={"original-bytes": str(meta["originalBytes"]),
                            "compressed-bytes": str(meta["compressedBytes"])})
    return {"ok": True, "s3": f"s3://{BUCKET}/{key}", "ratio": meta["originalBytes"]/max(1, meta["compressedBytes"])}

Step 2 — Subscribe the Lambda to a log group

bash
aws lambda add-permission \
  --function-name smallestzip-forwarder \
  --statement-id cwlogs-invoke \
  --action lambda:InvokeFunction \
  --principal logs.amazonaws.com

aws logs put-subscription-filter \
  --log-group-name /aws/lambda/my-app \
  --filter-name smallestzip \
  --filter-pattern "" \
  --destination-arn arn:aws:lambda:us-east-1:123456789012:function:smallestzip-forwarder

Step 3 — IAM policy for the Lambda execution role

json
{
  "Version": "2012-10-17",
  "Statement": [
    { "Effect": "Allow",
      "Action": ["logs:CreateLogStream", "logs:PutLogEvents"],
      "Resource": "arn:aws:logs:*:*:*" },
    { "Effect": "Allow",
      "Action": "s3:PutObject",
      "Resource": "arn:aws:s3:::my-log-archive/cloudwatch/*" }
  ]
}

Verification

bash
# Trigger a log event, then within ~60s confirm a new .sbz landed in S3:
aws s3 ls s3://my-log-archive/cloudwatch/ --recursive | tail
# And confirm round-trip integrity:
aws s3 cp s3://my-log-archive/cloudwatch/<LATEST>.sbz /tmp/last.sbz
smallest-logs decompress /tmp/last.sbz --out - | head

Troubleshooting

SymptomFix
Lambda times outBump timeout to 60s; reduce subscription filter batch size with a smaller log group.
401 UnauthorizedSMALLEST_API_KEY env var not set on the Lambda, or stale key. Rotate on Settings.
413 Payload Too LargeBatches are over 100 MB. Switch subscription destination to Kinesis Firehose → Lambda with smaller buffer (1 MB).
S3 AccessDeniedLambda role missing s3:PutObject on the target bucket prefix.
CloudWatch keeps both copiesSet a retention policy: aws logs put-retention-policy --log-group-name X --retention-in-days 7.

B) Splunk → smallest.zip

Two patterns work. Pick based on whether you want real-time forwarding or scheduled cold-tier archival.

Option 1 — Forward to HEC-compatible HTTP sink (real-time)

Splunk's HTTP Output (outputs.conf) speaks raw HTTP. We expose smallest.zip as the destination; Splunk batches and POSTs each batch as a file. Pro: near real-time, no scheduled job. Con: Splunk's HTTP output expects an HEC-shaped response — we wrap with a tiny middleware Lambda or use the CLI mode shown in option 2.

conf — $SPLUNK_HOME/etc/system/local/outputs.conf
[httpout]
httpEventCollectorToken = YOUR_API_KEY
uri = https://smallest.zip/api/log-files/upload?level=balanced
batchSize = 65536
batchTimeout = 30
sslVerifyServerCert = true

[tcpout]
defaultGroup = no_indexers

Splunk sends each batch as POST with JSON body; behind smallest.zip's /api/log-files/upload we accept the body as-is when Content-Type is application/json.

Option 2 — Scheduled saved search export (batch)

Better fit for cold archival: run a nightly saved search, dump to file, compress, ship. Pro: survives Splunk restarts, retries, replays. Con: not real-time.

bash — /opt/smallest/splunk_archive.sh (run from cron)
#!/usr/bin/env bash
set -euo pipefail
DAY=$(date -u -d 'yesterday' +%F)
OUT=/var/tmp/splunk-$DAY.log

/opt/splunk/bin/splunk search \
  "search index=* earliest=-1d@d latest=@d | table _raw" \
  -auth admin:$SPLUNK_PASS -output rawxml -maxout 0 \
  | sed -n 's:.*<text>\(.*\)</text>.*:\1:p' > "$OUT"

curl -fsS -X POST "https://smallest.zip/api/log-files/upload?level=max" \
  -H "x-api-key: $SMALLEST_API_KEY" \
  -F "file=@$OUT" | tee /var/log/smallest-splunk.log
rm "$OUT"

Verification

bash
curl -s "https://smallest.zip/api/files?codec=log-files&limit=5" \
  -H "x-api-key: YOUR_API_KEY" | jq '.[] | {uploadId, originalBytes, compressedBytes}'

Troubleshooting

SymptomFix
Splunk reports HEC 400Switch to option 2 (saved-search export). Splunk's HEC client is strict about response shape.
Saved-search OOM at midnightAdd | head 5000000 and run hourly instead of daily.
splunk search hangsAdd -timeout 600 and consider using dbxquery against an external index store.
Compressed sizes look identical to gzipYou're hitting the wrong codec. Confirm path is /api/log-files/upload (not the generic /api/upload).

C) Datadog → smallest.zip

The cheapest route uses Datadog's built-in Log Archives feature: Datadog already drops compressed JSON batches to your S3 bucket. We just hook an S3 PutObject trigger to a Lambda that re-compresses with smallest.zip and replaces the object.

Step 1 — Enable Datadog Log Archives

In Datadog: Logs → Configuration → Archives → Add a new archive. Point at s3://my-dd-archive/raw/. Datadog will write files like raw/dt=2026-06-02/hour=14/<uuid>.json.gz.

Step 2 — S3 trigger Lambda

python — datadog_recompress.py
import gzip, json, os, uuid, urllib.request, boto3

API_KEY  = os.environ["SMALLEST_API_KEY"]
ENDPOINT = "https://smallest.zip/api/log-files/upload?level=max"
s3 = boto3.client("s3")

def handler(event, _):
    for rec in event["Records"]:
        b, k = rec["s3"]["bucket"]["name"], rec["s3"]["object"]["key"]
        if not k.startswith("raw/"): continue

        body = s3.get_object(Bucket=b, Key=k)["Body"].read()
        plain = gzip.decompress(body)

        boundary = uuid.uuid4().hex
        mp = (f"--{boundary}\r\n"
              f'Content-Disposition: form-data; name="file"; filename="dd.log"\r\n'
              f"Content-Type: text/plain\r\n\r\n").encode() + plain + \
             f"\r\n--{boundary}--\r\n".encode()

        req = urllib.request.Request(ENDPOINT, data=mp, method="POST", headers={
            "x-api-key": API_KEY,
            "Content-Type": f"multipart/form-data; boundary={boundary}"})
        meta = json.loads(urllib.request.urlopen(req, timeout=120).read())

        dl = urllib.request.Request(
            f"https://smallest.zip/api/files/{meta['uploadId']}/download",
            headers={"x-api-key": API_KEY})
        blob = urllib.request.urlopen(dl, timeout=120).read()

        new_key = k.replace("raw/", "sbz/").rsplit(".", 2)[0] + ".sbz"
        s3.put_object(Bucket=b, Key=new_key, Body=blob)
        s3.delete_object(Bucket=b, Key=k)   # drop the gzip original

Alternative — Datadog Observability Pipelines

If you've enabled Datadog Observability Pipelines (Vector under the hood), add a fork sink instead. See the Vector section below — the syntax is identical.

Verification

bash
aws s3 ls s3://my-dd-archive/sbz/ --recursive --human-readable | tail
# Pick the freshest .sbz and round-trip it locally:
aws s3 cp s3://my-dd-archive/sbz/<LATEST>.sbz - | smallest-logs decompress - --out - | head

Troubleshooting

SymptomFix
No files appearing in raw/Datadog archives only fire when a saved-view filter matches. Add a catch-all view.
Lambda invokes but never finishesArchive files can be hundreds of MB. Bump Lambda memory to 1024MB and timeout to 300s.
Datadog rehydration breaksKeep the gzip originals (skip delete_object) until you've validated rehydration from .sbz.
Costs didn't dropVerify Datadog ingestion rate is unchanged — this integration replaces archive cost, not ingest.

D) Fluentd → smallest.zip

Use the built-in out_http plugin (Fluentd ≥ 1.7). Each chunk is POSTed as a single multipart upload.

conf — /etc/fluent/fluent.conf
<source>
  @type tail
  path /var/log/myapp/*.log
  pos_file /var/log/fluent/myapp.pos
  tag app.logs
  <parse>
    @type none
  </parse>
</source>

<match app.logs>
  @type http
  endpoint https://smallest.zip/api/log-files/upload?level=balanced
  open_timeout 10
  read_timeout 60
  <format>
    @type json
  </format>
  <buffer>
    @type file
    path /var/log/fluent/buf-smallest
    chunk_limit_size 8m
    flush_interval 30s
    retry_max_interval 60
  </buffer>
  <auth>
    method basic
  </auth>
  headers {"x-api-key":"YOUR_API_KEY"}
  json_array true
</match>

Verification

bash
echo "fluentd test $(date)" | sudo tee -a /var/log/myapp/test.log
sleep 35  # wait one flush interval
curl -s "https://smallest.zip/api/files?codec=log-files&limit=1" \
  -H "x-api-key: YOUR_API_KEY" | jq

Troubleshooting

SymptomFix
buffer overflowRaise chunk_limit_size or shorten flush_interval. Disk: bump total_limit_size.
403 from smallest.zipHeader was quoted wrong in headers {...} — must be valid JSON, double-quoted.
Logs send but file is named chunk-…Add <format> @type out_file </format> and set a filename via the http_method post directive.
Fluentd CPU spikesUse fluent-bit with the http output instead — same config keys, 5× faster.

E) Vector → smallest.zip

Vector's http sink is the cleanest fit. Use encoding.codec = "text" so each chunk arrives as raw log text (not JSON-wrapped).

toml — /etc/vector/vector.toml
[sources.app_logs]
  type = "file"
  include = ["/var/log/myapp/*.log"]
  read_from = "end"

[transforms.batch_tag]
  type = "remap"
  inputs = ["app_logs"]
  source = '.batch = "smallest"'

[sinks.smallest_zip]
  type = "http"
  inputs = ["batch_tag"]
  uri = "https://smallest.zip/api/log-files/upload?level=balanced"
  method = "post"
  compression = "none"   # smallest.zip does its own compression
  encoding.codec = "text"
  framing.method = "newline_delimited"

  [sinks.smallest_zip.request.headers]
    x-api-key = "YOUR_API_KEY"
    content-type = "text/plain"

  [sinks.smallest_zip.batch]
    max_bytes = 8388608    # 8 MB per upload
    timeout_secs = 30

  [sinks.smallest_zip.buffer]
    type = "disk"
    max_size = 1073741824  # 1 GB on-disk overflow
    when_full = "block"

Verification

bash
vector validate /etc/vector/vector.toml
sudo systemctl restart vector
# Tail the internal metrics endpoint:
curl -s http://localhost:8686/metrics | grep -E 'http_(sink|client)_.*smallest'

Troubleshooting

SymptomFix
encoding.codec = "json" by mistakeVector wraps each event in {"message":"..."}. Set codec = "text" for our log codec to recognise the format.
Sink retries forever on 4xxAdd request.retry_max_duration_secs = 300 so bad batches are eventually dropped instead of looping.
High memorySwitch buffer type = "disk" and lower batch.max_bytes.
Zero throughputConfirm the source actually ticked — curl localhost:8686/metrics | grep file_source.

F) Logstash → smallest.zip

Use the http output. Logstash's HTTP output supports format => "message", which sends each event as a plain text body — exactly what our log codec wants.

conf — /etc/logstash/conf.d/smallest.conf
input {
  file {
    path => "/var/log/myapp/*.log"
    start_position => "end"
    sincedb_path => "/var/lib/logstash/sincedb-smallest"
  }
}

output {
  http {
    url => "https://smallest.zip/api/log-files/upload?level=balanced"
    http_method => "post"
    format => "message"
    content_type => "text/plain"
    headers => {
      "x-api-key" => "YOUR_API_KEY"
    }
    pool_max => 10
    socket_timeout => 60
    retry_failed => true
    retry_non_idempotent => true
  }
}

Verification

bash
/usr/share/logstash/bin/logstash -t -f /etc/logstash/conf.d/smallest.conf
sudo systemctl restart logstash
echo "logstash test $(date)" | sudo tee -a /var/log/myapp/app.log
sleep 10
curl -s "https://smallest.zip/api/files?codec=log-files&limit=1" \
  -H "x-api-key: YOUR_API_KEY" | jq

Troubleshooting

SymptomFix
Each event posts as one requestAdd codec => line and tune the pipeline batch with pipeline.batch.size: 500 in logstash.yml.
400 Bad Requestformat => "json" wraps events — switch to "message" or "form" with a file field.
Pipeline blocks at startupTest config with -t first; usually a curly-brace mismatch in headers.
too many open filesBump LimitNOFILE=65536 in /etc/systemd/system/logstash.service.d/override.conf.

G) Raw syslog / journald → smallest.zip

No daemon, no agent. A nightly cron job ships yesterday's logs and removes the local copy after a successful upload.

Option 1 — Cron + journalctl

crontab — /etc/cron.d/smallest-logs
SHELL=/bin/bash
SMALLEST_API_KEY=YOUR_API_KEY

# Ship yesterday's journal at 02:13 every night
13 2 * * * root journalctl --since "yesterday" --until "today" --no-pager \
  | /usr/local/bin/smallest-logs compress - \
      --out /var/log/archive/$(hostname)-$(date -u -d yesterday +\%F).sbz \
  && logger "smallest-logs: shipped $(date -u -d yesterday +\%F)"

Option 2 — Direct curl (no CLI dependency)

bash — /usr/local/sbin/ship-syslog.sh
#!/usr/bin/env bash
set -euo pipefail
: "${SMALLEST_API_KEY:?must be set}"

SRC=/var/log/syslog.1   # logrotate's previous-day file
[ -f "$SRC" ] || { echo "no rotated syslog yet"; exit 0; }

RESP=$(curl -fsS -X POST "https://smallest.zip/api/log-files/upload?level=max" \
        -H "x-api-key: $SMALLEST_API_KEY" \
        -F "file=@${SRC}")
echo "$RESP" | jq -r '"shipped \(.uploadId): \(.originalBytes) -> \(.compressedBytes) bytes"'

Option 3 — rsyslog template + omprog

conf — /etc/rsyslog.d/60-smallest.conf
module(load="omprog")

template(name="RawMsg" type="string" string="%msg%\n")

action(type="omprog"
       binary="/usr/local/bin/smallest-logs compress - --rotate-out /var/log/archive/rsyslog --rotate-bytes 8388608"
       template="RawMsg")

Verification

bash
# Dry-run the nightly cron job right now
SMALLEST_API_KEY=YOUR_API_KEY journalctl --since "1 hour ago" --no-pager \
  | smallest-logs compress - --out /tmp/last-hour.sbz
ls -lh /tmp/last-hour.sbz
smallest-logs decompress /tmp/last-hour.sbz --out - | tail

Troubleshooting

SymptomFix
Cron job runs but produces 0-byte outputjournalctl in cron lacks a TTY — add --no-pager (already in the snippet) and ensure the user has systemd-journal group membership.
syslog.1 doesn't existlogrotate hasn't run yet on this host; install logrotate or change the script to read /var/log/syslog directly.
rsyslog spawns omprog repeatedlyAdd action.execOnlyWhenPreviousIsSuspended="on" and a queue.type="LinkedList" queue.
SELinux denies omprog execchcon -t bin_t /usr/local/bin/smallest-logs or write a targeted policy.
Disk fills with .sbz archivesAdd an upload step that ships the file to S3/R2 then rms the local copy. The CLI's --upload-s3 flag does both atomically.

Global troubleshooting

SymptomLikely causeFix
401 UnauthorizedAPI key wrong or rotatedCheck x-api-key header is literal. Rotate at Settings.
413 Payload Too LargeUpload > 100 MBSplit or stream via the CLI's chunked mode.
429 Too Many RequestsPlan rate limit hitAdd exponential backoff. Upgrade plan on Pricing.
Compression ratio worse than gzipWrong codec endpointMust be /api/log-files/upload, not /api/upload or another codec.
TLS handshake failsOld curl / OpenSSL on RHEL 7-era boxesUpgrade curl or use the CLI (statically linked).

Still stuck? Email support with the uploadId from the last successful response — we can replay the request server-side.