Log Pipeline Integrations

Overview

smallest.zip's log codec is byte-exact lossless and accepts any text-ish input — JSON lines, syslog, Apache/nginx access logs, application logs, audit trails, anything. The output is a compact .sbz blob (typically 10–25× smaller than gzip on real log corpora) that you can park in S3, GCS, R2, Azure Blob, or anywhere cheap.

This page shows the exact configuration needed to wire smallest.zip into the seven most common log pipelines. Every snippet on this page is copy-pastable.

The endpoint

POST https://smallest.zip/api/log-files/upload

Header / Field	Value	Notes
`x-api-key`	Your API key	Required. Get one on Settings.
`Content-Type`	`multipart/form-data`	File goes in the `file` field.
`?level=`	`fast` / `balanced` / `max`	Optional. Default `balanced`.

Expected response (201 Created)

json

{
  "uploadId": "9f3c2a01",
  "originalFilename": "app-2026-06-01.log",
  "originalBytes": 104857600,
  "compressedBytes": 4823104,
  "codec": "log-files",
  "compressionTimeMs": 2840,
  "createdAt": "2026-06-02T10:14:00Z"
}

Verify your key works right now

bash

echo "$(date) hello from smallest.zip" > /tmp/test.log
curl -X POST "https://smallest.zip/api/log-files/upload?level=balanced" \
  -H "x-api-key: YOUR_API_KEY" \
  -F "file=@/tmp/test.log"

The `smallest-logs` CLI

Some integrations below shell out to the official CLI. Install it on any Linux/macOS host:

bash

curl -fsSL https://smallest.zip/install/logs | sh
export SMALLEST_API_KEY="YOUR_API_KEY"

# Compress a file in place (writes app.log.sbz, removes original with --rm)
smallest-logs compress app.log

# Stream stdin (good for piping from journalctl, kafka, etc.)
journalctl --since yesterday | smallest-logs compress - --out yesterday.sbz

A) AWS CloudWatch Logs → smallest.zip

CloudWatch Logs storage costs $0.03/GB-month and ingestion is $0.50/GB. Replacing CloudWatch's own archive with a compressed S3 archive cuts the storage bill by ~95% on typical app log corpora.

Step 1 — Create the forwarding Lambda

This Lambda receives a CloudWatch Logs subscription event, decodes the gzipped payload, posts it to smallest.zip, and writes the returned .sbz to S3.

python — lambda_function.py

import base64, gzip, json, os, time, urllib.request, uuid
import boto3

API_KEY = os.environ["SMALLEST_API_KEY"]
BUCKET  = os.environ["ARCHIVE_BUCKET"]
PREFIX  = os.environ.get("ARCHIVE_PREFIX", "cloudwatch/")
ENDPOINT = "https://smallest.zip/api/log-files/upload?level=max"

s3 = boto3.client("s3")

def handler(event, _ctx):
    # CloudWatch subscription payload is base64-gzipped JSON
    payload = json.loads(gzip.decompress(base64.b64decode(event["awslogs"]["data"])))
    lines = "\n".join(e["message"] for e in payload["logEvents"]).encode()

    boundary = uuid.uuid4().hex
    body = (
        f"--{boundary}\r\n"
        f'Content-Disposition: form-data; name="file"; filename="batch.log"\r\n'
        f"Content-Type: text/plain\r\n\r\n"
    ).encode() + lines + f"\r\n--{boundary}--\r\n".encode()

    req = urllib.request.Request(ENDPOINT, data=body, method="POST", headers={
        "x-api-key": API_KEY,
        "Content-Type": f"multipart/form-data; boundary={boundary}",
    })
    with urllib.request.urlopen(req, timeout=60) as r:
        meta = json.loads(r.read())

    # Download the compressed blob and stash it in our own S3 bucket
    dl = urllib.request.Request(
        f"https://smallest.zip/api/files/{meta['uploadId']}/download",
        headers={"x-api-key": API_KEY})
    with urllib.request.urlopen(dl, timeout=60) as r:
        blob = r.read()

    key = f"{PREFIX}{payload['logGroup']}/{int(time.time())}-{meta['uploadId']}.sbz"
    s3.put_object(Bucket=BUCKET, Key=key, Body=blob,
                  Metadata={"original-bytes": str(meta["originalBytes"]),
                            "compressed-bytes": str(meta["compressedBytes"])})
    return {"ok": True, "s3": f"s3://{BUCKET}/{key}", "ratio": meta["originalBytes"]/max(1, meta["compressedBytes"])}

Step 2 — Subscribe the Lambda to a log group

bash

aws lambda add-permission \
  --function-name smallestzip-forwarder \
  --statement-id cwlogs-invoke \
  --action lambda:InvokeFunction \
  --principal logs.amazonaws.com

aws logs put-subscription-filter \
  --log-group-name /aws/lambda/my-app \
  --filter-name smallestzip \
  --filter-pattern "" \
  --destination-arn arn:aws:lambda:us-east-1:123456789012:function:smallestzip-forwarder

Step 3 — IAM policy for the Lambda execution role

json

{
  "Version": "2012-10-17",
  "Statement": [
    { "Effect": "Allow",
      "Action": ["logs:CreateLogStream", "logs:PutLogEvents"],
      "Resource": "arn:aws:logs:*:*:*" },
    { "Effect": "Allow",
      "Action": "s3:PutObject",
      "Resource": "arn:aws:s3:::my-log-archive/cloudwatch/*" }
  ]
}

Verification

bash

# Trigger a log event, then within ~60s confirm a new .sbz landed in S3:
aws s3 ls s3://my-log-archive/cloudwatch/ --recursive | tail
# And confirm round-trip integrity:
aws s3 cp s3://my-log-archive/cloudwatch/<LATEST>.sbz /tmp/last.sbz
smallest-logs decompress /tmp/last.sbz --out - | head

Troubleshooting

Symptom	Fix
Lambda times out	Bump timeout to 60s; reduce subscription filter batch size with a smaller log group.
`401 Unauthorized`	`SMALLEST_API_KEY` env var not set on the Lambda, or stale key. Rotate on Settings.
`413 Payload Too Large`	Batches are over 100 MB. Switch subscription destination to Kinesis Firehose → Lambda with smaller buffer (1 MB).
S3 `AccessDenied`	Lambda role missing `s3:PutObject` on the target bucket prefix.
CloudWatch keeps both copies	Set a retention policy: `aws logs put-retention-policy --log-group-name X --retention-in-days 7`.

B) Splunk → smallest.zip

Two patterns work. Pick based on whether you want real-time forwarding or scheduled cold-tier archival.

Option 1 — Forward to HEC-compatible HTTP sink (real-time)

Splunk's HTTP Output (outputs.conf) speaks raw HTTP. We expose smallest.zip as the destination; Splunk batches and POSTs each batch as a file. Pro: near real-time, no scheduled job. Con: Splunk's HTTP output expects an HEC-shaped response — we wrap with a tiny middleware Lambda or use the CLI mode shown in option 2.

conf — $SPLUNK_HOME/etc/system/local/outputs.conf

[httpout]
httpEventCollectorToken = YOUR_API_KEY
uri = https://smallest.zip/api/log-files/upload?level=balanced
batchSize = 65536
batchTimeout = 30
sslVerifyServerCert = true

[tcpout]
defaultGroup = no_indexers

Splunk sends each batch as POST with JSON body; behind smallest.zip's /api/log-files/upload we accept the body as-is when Content-Type is application/json.

Option 2 — Scheduled saved search export (batch)

Better fit for cold archival: run a nightly saved search, dump to file, compress, ship. Pro: survives Splunk restarts, retries, replays. Con: not real-time.

bash — /opt/smallest/splunk_archive.sh (run from cron)

#!/usr/bin/env bash
set -euo pipefail
DAY=$(date -u -d 'yesterday' +%F)
OUT=/var/tmp/splunk-$DAY.log

/opt/splunk/bin/splunk search \
  "search index=* earliest=-1d@d latest=@d | table _raw" \
  -auth admin:$SPLUNK_PASS -output rawxml -maxout 0 \
  | sed -n 's:.*<text>\(.*\)</text>.*:\1:p' > "$OUT"

curl -fsS -X POST "https://smallest.zip/api/log-files/upload?level=max" \
  -H "x-api-key: $SMALLEST_API_KEY" \
  -F "file=@$OUT" | tee /var/log/smallest-splunk.log
rm "$OUT"

Verification

bash

curl -s "https://smallest.zip/api/files?codec=log-files&limit=5" \
  -H "x-api-key: YOUR_API_KEY" | jq '.[] | {uploadId, originalBytes, compressedBytes}'

Troubleshooting

Symptom	Fix
Splunk reports HEC 400	Switch to option 2 (saved-search export). Splunk's HEC client is strict about response shape.
Saved-search OOM at midnight	Add `\| head 5000000` and run hourly instead of daily.
`splunk search` hangs	Add `-timeout 600` and consider using `dbxquery` against an external index store.
Compressed sizes look identical to gzip	You're hitting the wrong codec. Confirm path is `/api/log-files/upload` (not the generic `/api/upload`).

C) Datadog → smallest.zip

The cheapest route uses Datadog's built-in Log Archives feature: Datadog already drops compressed JSON batches to your S3 bucket. We just hook an S3 PutObject trigger to a Lambda that re-compresses with smallest.zip and replaces the object.

Step 1 — Enable Datadog Log Archives

In Datadog: Logs → Configuration → Archives → Add a new archive. Point at s3://my-dd-archive/raw/. Datadog will write files like raw/dt=2026-06-02/hour=14/<uuid>.json.gz.

Step 2 — S3 trigger Lambda

python — datadog_recompress.py

import gzip, json, os, uuid, urllib.request, boto3

API_KEY  = os.environ["SMALLEST_API_KEY"]
ENDPOINT = "https://smallest.zip/api/log-files/upload?level=max"
s3 = boto3.client("s3")

def handler(event, _):
    for rec in event["Records"]:
        b, k = rec["s3"]["bucket"]["name"], rec["s3"]["object"]["key"]
        if not k.startswith("raw/"): continue

        body = s3.get_object(Bucket=b, Key=k)["Body"].read()
        plain = gzip.decompress(body)

        boundary = uuid.uuid4().hex
        mp = (f"--{boundary}\r\n"
              f'Content-Disposition: form-data; name="file"; filename="dd.log"\r\n'
              f"Content-Type: text/plain\r\n\r\n").encode() + plain + \
             f"\r\n--{boundary}--\r\n".encode()

        req = urllib.request.Request(ENDPOINT, data=mp, method="POST", headers={
            "x-api-key": API_KEY,
            "Content-Type": f"multipart/form-data; boundary={boundary}"})
        meta = json.loads(urllib.request.urlopen(req, timeout=120).read())

        dl = urllib.request.Request(
            f"https://smallest.zip/api/files/{meta['uploadId']}/download",
            headers={"x-api-key": API_KEY})
        blob = urllib.request.urlopen(dl, timeout=120).read()

        new_key = k.replace("raw/", "sbz/").rsplit(".", 2)[0] + ".sbz"
        s3.put_object(Bucket=b, Key=new_key, Body=blob)
        s3.delete_object(Bucket=b, Key=k)   # drop the gzip original

Alternative — Datadog Observability Pipelines

If you've enabled Datadog Observability Pipelines (Vector under the hood), add a fork sink instead. See the Vector section below — the syntax is identical.

Verification

bash

aws s3 ls s3://my-dd-archive/sbz/ --recursive --human-readable | tail
# Pick the freshest .sbz and round-trip it locally:
aws s3 cp s3://my-dd-archive/sbz/<LATEST>.sbz - | smallest-logs decompress - --out - | head

Troubleshooting

Symptom	Fix
No files appearing in `raw/`	Datadog archives only fire when a saved-view filter matches. Add a catch-all view.
Lambda invokes but never finishes	Archive files can be hundreds of MB. Bump Lambda memory to 1024MB and timeout to 300s.
Datadog rehydration breaks	Keep the gzip originals (skip `delete_object`) until you've validated rehydration from `.sbz`.
Costs didn't drop	Verify Datadog ingestion rate is unchanged — this integration replaces archive cost, not ingest.

D) Fluentd → smallest.zip

Use the built-in out_http plugin (Fluentd ≥ 1.7). Each chunk is POSTed as a single multipart upload.

conf — /etc/fluent/fluent.conf

<source>
  @type tail
  path /var/log/myapp/*.log
  pos_file /var/log/fluent/myapp.pos
  tag app.logs
  <parse>
    @type none
  </parse>
</source>

<match app.logs>
  @type http
  endpoint https://smallest.zip/api/log-files/upload?level=balanced
  open_timeout 10
  read_timeout 60
  <format>
    @type json
  </format>
  <buffer>
    @type file
    path /var/log/fluent/buf-smallest
    chunk_limit_size 8m
    flush_interval 30s
    retry_max_interval 60
  </buffer>
  <auth>
    method basic
  </auth>
  headers {"x-api-key":"YOUR_API_KEY"}
  json_array true
</match>

Verification

bash

echo "fluentd test $(date)" | sudo tee -a /var/log/myapp/test.log
sleep 35  # wait one flush interval
curl -s "https://smallest.zip/api/files?codec=log-files&limit=1" \
  -H "x-api-key: YOUR_API_KEY" | jq

Troubleshooting

Symptom	Fix
`buffer overflow`	Raise `chunk_limit_size` or shorten `flush_interval`. Disk: bump `total_limit_size`.
403 from smallest.zip	Header was quoted wrong in `headers {...}` — must be valid JSON, double-quoted.
Logs send but file is named `chunk-…`	Add `<format> @type out_file </format>` and set a filename via the `http_method post` directive.
Fluentd CPU spikes	Use `fluent-bit` with the `http` output instead — same config keys, 5× faster.

E) Vector → smallest.zip

Vector's http sink is the cleanest fit. Use encoding.codec = "text" so each chunk arrives as raw log text (not JSON-wrapped).

toml — /etc/vector/vector.toml

[sources.app_logs]
  type = "file"
  include = ["/var/log/myapp/*.log"]
  read_from = "end"

[transforms.batch_tag]
  type = "remap"
  inputs = ["app_logs"]
  source = '.batch = "smallest"'

[sinks.smallest_zip]
  type = "http"
  inputs = ["batch_tag"]
  uri = "https://smallest.zip/api/log-files/upload?level=balanced"
  method = "post"
  compression = "none"   # smallest.zip does its own compression
  encoding.codec = "text"
  framing.method = "newline_delimited"

  [sinks.smallest_zip.request.headers]
    x-api-key = "YOUR_API_KEY"
    content-type = "text/plain"

  [sinks.smallest_zip.batch]
    max_bytes = 8388608    # 8 MB per upload
    timeout_secs = 30

  [sinks.smallest_zip.buffer]
    type = "disk"
    max_size = 1073741824  # 1 GB on-disk overflow
    when_full = "block"

Verification

bash

vector validate /etc/vector/vector.toml
sudo systemctl restart vector
# Tail the internal metrics endpoint:
curl -s http://localhost:8686/metrics | grep -E 'http_(sink|client)_.*smallest'

Troubleshooting

Symptom	Fix
`encoding.codec = "json"` by mistake	Vector wraps each event in `{"message":"..."}`. Set `codec = "text"` for our log codec to recognise the format.
Sink retries forever on 4xx	Add `request.retry_max_duration_secs = 300` so bad batches are eventually dropped instead of looping.
High memory	Switch buffer `type = "disk"` and lower `batch.max_bytes`.
Zero throughput	Confirm the source actually ticked — `curl localhost:8686/metrics \| grep file_source`.

F) Logstash → smallest.zip

Use the http output. Logstash's HTTP output supports format => "message", which sends each event as a plain text body — exactly what our log codec wants.

conf — /etc/logstash/conf.d/smallest.conf

input {
  file {
    path => "/var/log/myapp/*.log"
    start_position => "end"
    sincedb_path => "/var/lib/logstash/sincedb-smallest"
  }
}

output {
  http {
    url => "https://smallest.zip/api/log-files/upload?level=balanced"
    http_method => "post"
    format => "message"
    content_type => "text/plain"
    headers => {
      "x-api-key" => "YOUR_API_KEY"
    }
    pool_max => 10
    socket_timeout => 60
    retry_failed => true
    retry_non_idempotent => true
  }
}

Verification

bash

/usr/share/logstash/bin/logstash -t -f /etc/logstash/conf.d/smallest.conf
sudo systemctl restart logstash
echo "logstash test $(date)" | sudo tee -a /var/log/myapp/app.log
sleep 10
curl -s "https://smallest.zip/api/files?codec=log-files&limit=1" \
  -H "x-api-key: YOUR_API_KEY" | jq

Troubleshooting

Symptom	Fix
Each event posts as one request	Add `codec => line` and tune the pipeline batch with `pipeline.batch.size: 500` in `logstash.yml`.
`400 Bad Request`	`format => "json"` wraps events — switch to `"message"` or `"form"` with a `file` field.
Pipeline blocks at startup	Test config with `-t` first; usually a curly-brace mismatch in `headers`.
`too many open files`	Bump `LimitNOFILE=65536` in `/etc/systemd/system/logstash.service.d/override.conf`.

G) Raw syslog / journald → smallest.zip

No daemon, no agent. A nightly cron job ships yesterday's logs and removes the local copy after a successful upload.

Option 1 — Cron + `journalctl`

crontab — /etc/cron.d/smallest-logs

SHELL=/bin/bash
SMALLEST_API_KEY=YOUR_API_KEY

# Ship yesterday's journal at 02:13 every night
13 2 * * * root journalctl --since "yesterday" --until "today" --no-pager \
  | /usr/local/bin/smallest-logs compress - \
      --out /var/log/archive/$(hostname)-$(date -u -d yesterday +\%F).sbz \
  && logger "smallest-logs: shipped $(date -u -d yesterday +\%F)"

Option 2 — Direct curl (no CLI dependency)

bash — /usr/local/sbin/ship-syslog.sh

#!/usr/bin/env bash
set -euo pipefail
: "${SMALLEST_API_KEY:?must be set}"

SRC=/var/log/syslog.1   # logrotate's previous-day file
[ -f "$SRC" ] || { echo "no rotated syslog yet"; exit 0; }

RESP=$(curl -fsS -X POST "https://smallest.zip/api/log-files/upload?level=max" \
        -H "x-api-key: $SMALLEST_API_KEY" \
        -F "file=@${SRC}")
echo "$RESP" | jq -r '"shipped \(.uploadId): \(.originalBytes) -> \(.compressedBytes) bytes"'

Option 3 — rsyslog template + omprog

conf — /etc/rsyslog.d/60-smallest.conf

module(load="omprog")

template(name="RawMsg" type="string" string="%msg%\n")

action(type="omprog"
       binary="/usr/local/bin/smallest-logs compress - --rotate-out /var/log/archive/rsyslog --rotate-bytes 8388608"
       template="RawMsg")

Verification

bash

# Dry-run the nightly cron job right now
SMALLEST_API_KEY=YOUR_API_KEY journalctl --since "1 hour ago" --no-pager \
  | smallest-logs compress - --out /tmp/last-hour.sbz
ls -lh /tmp/last-hour.sbz
smallest-logs decompress /tmp/last-hour.sbz --out - | tail

Troubleshooting

Symptom	Fix
Cron job runs but produces 0-byte output	`journalctl` in cron lacks a TTY — add `--no-pager` (already in the snippet) and ensure the user has `systemd-journal` group membership.
`syslog.1` doesn't exist	logrotate hasn't run yet on this host; install `logrotate` or change the script to read `/var/log/syslog` directly.
rsyslog spawns omprog repeatedly	Add `action.execOnlyWhenPreviousIsSuspended="on"` and a `queue.type="LinkedList"` queue.
SELinux denies `omprog` exec	`chcon -t bin_t /usr/local/bin/smallest-logs` or write a targeted policy.
Disk fills with `.sbz` archives	Add an upload step that ships the file to S3/R2 then `rm`s the local copy. The CLI's `--upload-s3` flag does both atomically.

Global troubleshooting

Symptom	Likely cause	Fix
`401 Unauthorized`	API key wrong or rotated	Check `x-api-key` header is literal. Rotate at Settings.
`413 Payload Too Large`	Upload > 100 MB	Split or stream via the CLI's chunked mode.
`429 Too Many Requests`	Plan rate limit hit	Add exponential backoff. Upgrade plan on Pricing.
Compression ratio worse than gzip	Wrong codec endpoint	Must be `/api/log-files/upload`, not `/api/upload` or another codec.
TLS handshake fails	Old curl / OpenSSL on RHEL 7-era boxes	Upgrade curl or use the CLI (statically linked).

Still stuck? Email support with the uploadId from the last successful response — we can replay the request server-side.

Log Pipeline Integrations

Overview

The endpoint

Expected response (201 Created)

Verify your key works right now

The smallest-logs CLI

A) AWS CloudWatch Logs → smallest.zip

Step 1 — Create the forwarding Lambda

Step 2 — Subscribe the Lambda to a log group

Step 3 — IAM policy for the Lambda execution role

Verification

Troubleshooting

B) Splunk → smallest.zip

Option 1 — Forward to HEC-compatible HTTP sink (real-time)

Option 2 — Scheduled saved search export (batch)

Verification

Troubleshooting

C) Datadog → smallest.zip

Step 1 — Enable Datadog Log Archives

Step 2 — S3 trigger Lambda

Alternative — Datadog Observability Pipelines

Verification

Troubleshooting

D) Fluentd → smallest.zip

Verification

Troubleshooting

E) Vector → smallest.zip

Verification

Troubleshooting

F) Logstash → smallest.zip

Verification

Troubleshooting

G) Raw syslog / journald → smallest.zip

Option 1 — Cron + journalctl

Option 2 — Direct curl (no CLI dependency)

Option 3 — rsyslog template + omprog

Verification

Troubleshooting

Global troubleshooting

The `smallest-logs` CLI

Option 1 — Cron + `journalctl`