Log Pipeline Integrations
Connect smallest.zip to your existing log pipeline (CloudWatch, Splunk, Datadog, Fluentd, Vector, Logstash, raw syslog) in under 5 minutes. Stop paying ingestion rates for cold storage.
Overview
smallest.zip's log codec is byte-exact lossless and accepts any text-ish input — JSON lines, syslog, Apache/nginx access logs, application logs, audit trails, anything. The output is a compact .sbz blob (typically 10–25× smaller than gzip on real log corpora) that you can park in S3, GCS, R2, Azure Blob, or anywhere cheap.
This page shows the exact configuration needed to wire smallest.zip into the seven most common log pipelines. Every snippet on this page is copy-pastable.
The endpoint
https://smallest.zip/api/log-files/upload
| Header / Field | Value | Notes |
|---|---|---|
x-api-key | Your API key | Required. Get one on Settings. |
Content-Type | multipart/form-data | File goes in the file field. |
?level= | fast / balanced / max | Optional. Default balanced. |
Expected response (201 Created)
{
"uploadId": "9f3c2a01",
"originalFilename": "app-2026-06-01.log",
"originalBytes": 104857600,
"compressedBytes": 4823104,
"codec": "log-files",
"compressionTimeMs": 2840,
"createdAt": "2026-06-02T10:14:00Z"
}
Verify your key works right now
echo "$(date) hello from smallest.zip" > /tmp/test.log
curl -X POST "https://smallest.zip/api/log-files/upload?level=balanced" \
-H "x-api-key: YOUR_API_KEY" \
-F "file=@/tmp/test.log"
The smallest-logs CLI
Some integrations below shell out to the official CLI. Install it on any Linux/macOS host:
curl -fsSL https://smallest.zip/install/logs | sh
export SMALLEST_API_KEY="YOUR_API_KEY"
# Compress a file in place (writes app.log.sbz, removes original with --rm)
smallest-logs compress app.log
# Stream stdin (good for piping from journalctl, kafka, etc.)
journalctl --since yesterday | smallest-logs compress - --out yesterday.sbz
A) AWS CloudWatch Logs → smallest.zip
CloudWatch Logs storage costs $0.03/GB-month and ingestion is $0.50/GB. Replacing CloudWatch's own archive with a compressed S3 archive cuts the storage bill by ~95% on typical app log corpora.
Step 1 — Create the forwarding Lambda
This Lambda receives a CloudWatch Logs subscription event, decodes the gzipped payload, posts it to smallest.zip, and writes the returned .sbz to S3.
import base64, gzip, json, os, time, urllib.request, uuid
import boto3
API_KEY = os.environ["SMALLEST_API_KEY"]
BUCKET = os.environ["ARCHIVE_BUCKET"]
PREFIX = os.environ.get("ARCHIVE_PREFIX", "cloudwatch/")
ENDPOINT = "https://smallest.zip/api/log-files/upload?level=max"
s3 = boto3.client("s3")
def handler(event, _ctx):
# CloudWatch subscription payload is base64-gzipped JSON
payload = json.loads(gzip.decompress(base64.b64decode(event["awslogs"]["data"])))
lines = "\n".join(e["message"] for e in payload["logEvents"]).encode()
boundary = uuid.uuid4().hex
body = (
f"--{boundary}\r\n"
f'Content-Disposition: form-data; name="file"; filename="batch.log"\r\n'
f"Content-Type: text/plain\r\n\r\n"
).encode() + lines + f"\r\n--{boundary}--\r\n".encode()
req = urllib.request.Request(ENDPOINT, data=body, method="POST", headers={
"x-api-key": API_KEY,
"Content-Type": f"multipart/form-data; boundary={boundary}",
})
with urllib.request.urlopen(req, timeout=60) as r:
meta = json.loads(r.read())
# Download the compressed blob and stash it in our own S3 bucket
dl = urllib.request.Request(
f"https://smallest.zip/api/files/{meta['uploadId']}/download",
headers={"x-api-key": API_KEY})
with urllib.request.urlopen(dl, timeout=60) as r:
blob = r.read()
key = f"{PREFIX}{payload['logGroup']}/{int(time.time())}-{meta['uploadId']}.sbz"
s3.put_object(Bucket=BUCKET, Key=key, Body=blob,
Metadata={"original-bytes": str(meta["originalBytes"]),
"compressed-bytes": str(meta["compressedBytes"])})
return {"ok": True, "s3": f"s3://{BUCKET}/{key}", "ratio": meta["originalBytes"]/max(1, meta["compressedBytes"])}
Step 2 — Subscribe the Lambda to a log group
aws lambda add-permission \
--function-name smallestzip-forwarder \
--statement-id cwlogs-invoke \
--action lambda:InvokeFunction \
--principal logs.amazonaws.com
aws logs put-subscription-filter \
--log-group-name /aws/lambda/my-app \
--filter-name smallestzip \
--filter-pattern "" \
--destination-arn arn:aws:lambda:us-east-1:123456789012:function:smallestzip-forwarder
Step 3 — IAM policy for the Lambda execution role
{
"Version": "2012-10-17",
"Statement": [
{ "Effect": "Allow",
"Action": ["logs:CreateLogStream", "logs:PutLogEvents"],
"Resource": "arn:aws:logs:*:*:*" },
{ "Effect": "Allow",
"Action": "s3:PutObject",
"Resource": "arn:aws:s3:::my-log-archive/cloudwatch/*" }
]
}
Verification
# Trigger a log event, then within ~60s confirm a new .sbz landed in S3:
aws s3 ls s3://my-log-archive/cloudwatch/ --recursive | tail
# And confirm round-trip integrity:
aws s3 cp s3://my-log-archive/cloudwatch/<LATEST>.sbz /tmp/last.sbz
smallest-logs decompress /tmp/last.sbz --out - | head
Troubleshooting
| Symptom | Fix |
|---|---|
| Lambda times out | Bump timeout to 60s; reduce subscription filter batch size with a smaller log group. |
401 Unauthorized | SMALLEST_API_KEY env var not set on the Lambda, or stale key. Rotate on Settings. |
413 Payload Too Large | Batches are over 100 MB. Switch subscription destination to Kinesis Firehose → Lambda with smaller buffer (1 MB). |
S3 AccessDenied | Lambda role missing s3:PutObject on the target bucket prefix. |
| CloudWatch keeps both copies | Set a retention policy: aws logs put-retention-policy --log-group-name X --retention-in-days 7. |
B) Splunk → smallest.zip
Two patterns work. Pick based on whether you want real-time forwarding or scheduled cold-tier archival.
Option 1 — Forward to HEC-compatible HTTP sink (real-time)
Splunk's HTTP Output (outputs.conf) speaks raw HTTP. We expose smallest.zip as the destination; Splunk batches and POSTs each batch as a file. Pro: near real-time, no scheduled job. Con: Splunk's HTTP output expects an HEC-shaped response — we wrap with a tiny middleware Lambda or use the CLI mode shown in option 2.
[httpout]
httpEventCollectorToken = YOUR_API_KEY
uri = https://smallest.zip/api/log-files/upload?level=balanced
batchSize = 65536
batchTimeout = 30
sslVerifyServerCert = true
[tcpout]
defaultGroup = no_indexers
Splunk sends each batch as POST with JSON body; behind smallest.zip's /api/log-files/upload we accept the body as-is when Content-Type is application/json.
Option 2 — Scheduled saved search export (batch)
Better fit for cold archival: run a nightly saved search, dump to file, compress, ship. Pro: survives Splunk restarts, retries, replays. Con: not real-time.
#!/usr/bin/env bash
set -euo pipefail
DAY=$(date -u -d 'yesterday' +%F)
OUT=/var/tmp/splunk-$DAY.log
/opt/splunk/bin/splunk search \
"search index=* earliest=-1d@d latest=@d | table _raw" \
-auth admin:$SPLUNK_PASS -output rawxml -maxout 0 \
| sed -n 's:.*<text>\(.*\)</text>.*:\1:p' > "$OUT"
curl -fsS -X POST "https://smallest.zip/api/log-files/upload?level=max" \
-H "x-api-key: $SMALLEST_API_KEY" \
-F "file=@$OUT" | tee /var/log/smallest-splunk.log
rm "$OUT"
Verification
curl -s "https://smallest.zip/api/files?codec=log-files&limit=5" \
-H "x-api-key: YOUR_API_KEY" | jq '.[] | {uploadId, originalBytes, compressedBytes}'
Troubleshooting
| Symptom | Fix |
|---|---|
| Splunk reports HEC 400 | Switch to option 2 (saved-search export). Splunk's HEC client is strict about response shape. |
| Saved-search OOM at midnight | Add | head 5000000 and run hourly instead of daily. |
splunk search hangs | Add -timeout 600 and consider using dbxquery against an external index store. |
| Compressed sizes look identical to gzip | You're hitting the wrong codec. Confirm path is /api/log-files/upload (not the generic /api/upload). |
C) Datadog → smallest.zip
The cheapest route uses Datadog's built-in Log Archives feature: Datadog already drops compressed JSON batches to your S3 bucket. We just hook an S3 PutObject trigger to a Lambda that re-compresses with smallest.zip and replaces the object.
Step 1 — Enable Datadog Log Archives
In Datadog: Logs → Configuration → Archives → Add a new archive. Point at s3://my-dd-archive/raw/. Datadog will write files like raw/dt=2026-06-02/hour=14/<uuid>.json.gz.
Step 2 — S3 trigger Lambda
import gzip, json, os, uuid, urllib.request, boto3
API_KEY = os.environ["SMALLEST_API_KEY"]
ENDPOINT = "https://smallest.zip/api/log-files/upload?level=max"
s3 = boto3.client("s3")
def handler(event, _):
for rec in event["Records"]:
b, k = rec["s3"]["bucket"]["name"], rec["s3"]["object"]["key"]
if not k.startswith("raw/"): continue
body = s3.get_object(Bucket=b, Key=k)["Body"].read()
plain = gzip.decompress(body)
boundary = uuid.uuid4().hex
mp = (f"--{boundary}\r\n"
f'Content-Disposition: form-data; name="file"; filename="dd.log"\r\n'
f"Content-Type: text/plain\r\n\r\n").encode() + plain + \
f"\r\n--{boundary}--\r\n".encode()
req = urllib.request.Request(ENDPOINT, data=mp, method="POST", headers={
"x-api-key": API_KEY,
"Content-Type": f"multipart/form-data; boundary={boundary}"})
meta = json.loads(urllib.request.urlopen(req, timeout=120).read())
dl = urllib.request.Request(
f"https://smallest.zip/api/files/{meta['uploadId']}/download",
headers={"x-api-key": API_KEY})
blob = urllib.request.urlopen(dl, timeout=120).read()
new_key = k.replace("raw/", "sbz/").rsplit(".", 2)[0] + ".sbz"
s3.put_object(Bucket=b, Key=new_key, Body=blob)
s3.delete_object(Bucket=b, Key=k) # drop the gzip original
Alternative — Datadog Observability Pipelines
If you've enabled Datadog Observability Pipelines (Vector under the hood), add a fork sink instead. See the Vector section below — the syntax is identical.
Verification
aws s3 ls s3://my-dd-archive/sbz/ --recursive --human-readable | tail
# Pick the freshest .sbz and round-trip it locally:
aws s3 cp s3://my-dd-archive/sbz/<LATEST>.sbz - | smallest-logs decompress - --out - | head
Troubleshooting
| Symptom | Fix |
|---|---|
No files appearing in raw/ | Datadog archives only fire when a saved-view filter matches. Add a catch-all view. |
| Lambda invokes but never finishes | Archive files can be hundreds of MB. Bump Lambda memory to 1024MB and timeout to 300s. |
| Datadog rehydration breaks | Keep the gzip originals (skip delete_object) until you've validated rehydration from .sbz. |
| Costs didn't drop | Verify Datadog ingestion rate is unchanged — this integration replaces archive cost, not ingest. |
D) Fluentd → smallest.zip
Use the built-in out_http plugin (Fluentd ≥ 1.7). Each chunk is POSTed as a single multipart upload.
<source>
@type tail
path /var/log/myapp/*.log
pos_file /var/log/fluent/myapp.pos
tag app.logs
<parse>
@type none
</parse>
</source>
<match app.logs>
@type http
endpoint https://smallest.zip/api/log-files/upload?level=balanced
open_timeout 10
read_timeout 60
<format>
@type json
</format>
<buffer>
@type file
path /var/log/fluent/buf-smallest
chunk_limit_size 8m
flush_interval 30s
retry_max_interval 60
</buffer>
<auth>
method basic
</auth>
headers {"x-api-key":"YOUR_API_KEY"}
json_array true
</match>
Verification
echo "fluentd test $(date)" | sudo tee -a /var/log/myapp/test.log
sleep 35 # wait one flush interval
curl -s "https://smallest.zip/api/files?codec=log-files&limit=1" \
-H "x-api-key: YOUR_API_KEY" | jq
Troubleshooting
| Symptom | Fix |
|---|---|
buffer overflow | Raise chunk_limit_size or shorten flush_interval. Disk: bump total_limit_size. |
| 403 from smallest.zip | Header was quoted wrong in headers {...} — must be valid JSON, double-quoted. |
Logs send but file is named chunk-… | Add <format> @type out_file </format> and set a filename via the http_method post directive. |
| Fluentd CPU spikes | Use fluent-bit with the http output instead — same config keys, 5× faster. |
E) Vector → smallest.zip
Vector's http sink is the cleanest fit. Use encoding.codec = "text" so each chunk arrives as raw log text (not JSON-wrapped).
[sources.app_logs]
type = "file"
include = ["/var/log/myapp/*.log"]
read_from = "end"
[transforms.batch_tag]
type = "remap"
inputs = ["app_logs"]
source = '.batch = "smallest"'
[sinks.smallest_zip]
type = "http"
inputs = ["batch_tag"]
uri = "https://smallest.zip/api/log-files/upload?level=balanced"
method = "post"
compression = "none" # smallest.zip does its own compression
encoding.codec = "text"
framing.method = "newline_delimited"
[sinks.smallest_zip.request.headers]
x-api-key = "YOUR_API_KEY"
content-type = "text/plain"
[sinks.smallest_zip.batch]
max_bytes = 8388608 # 8 MB per upload
timeout_secs = 30
[sinks.smallest_zip.buffer]
type = "disk"
max_size = 1073741824 # 1 GB on-disk overflow
when_full = "block"
Verification
vector validate /etc/vector/vector.toml
sudo systemctl restart vector
# Tail the internal metrics endpoint:
curl -s http://localhost:8686/metrics | grep -E 'http_(sink|client)_.*smallest'
Troubleshooting
| Symptom | Fix |
|---|---|
encoding.codec = "json" by mistake | Vector wraps each event in {"message":"..."}. Set codec = "text" for our log codec to recognise the format. |
| Sink retries forever on 4xx | Add request.retry_max_duration_secs = 300 so bad batches are eventually dropped instead of looping. |
| High memory | Switch buffer type = "disk" and lower batch.max_bytes. |
| Zero throughput | Confirm the source actually ticked — curl localhost:8686/metrics | grep file_source. |
F) Logstash → smallest.zip
Use the http output. Logstash's HTTP output supports format => "message", which sends each event as a plain text body — exactly what our log codec wants.
input {
file {
path => "/var/log/myapp/*.log"
start_position => "end"
sincedb_path => "/var/lib/logstash/sincedb-smallest"
}
}
output {
http {
url => "https://smallest.zip/api/log-files/upload?level=balanced"
http_method => "post"
format => "message"
content_type => "text/plain"
headers => {
"x-api-key" => "YOUR_API_KEY"
}
pool_max => 10
socket_timeout => 60
retry_failed => true
retry_non_idempotent => true
}
}
Verification
/usr/share/logstash/bin/logstash -t -f /etc/logstash/conf.d/smallest.conf
sudo systemctl restart logstash
echo "logstash test $(date)" | sudo tee -a /var/log/myapp/app.log
sleep 10
curl -s "https://smallest.zip/api/files?codec=log-files&limit=1" \
-H "x-api-key: YOUR_API_KEY" | jq
Troubleshooting
| Symptom | Fix |
|---|---|
| Each event posts as one request | Add codec => line and tune the pipeline batch with pipeline.batch.size: 500 in logstash.yml. |
400 Bad Request | format => "json" wraps events — switch to "message" or "form" with a file field. |
| Pipeline blocks at startup | Test config with -t first; usually a curly-brace mismatch in headers. |
too many open files | Bump LimitNOFILE=65536 in /etc/systemd/system/logstash.service.d/override.conf. |
G) Raw syslog / journald → smallest.zip
No daemon, no agent. A nightly cron job ships yesterday's logs and removes the local copy after a successful upload.
Option 1 — Cron + journalctl
SHELL=/bin/bash
SMALLEST_API_KEY=YOUR_API_KEY
# Ship yesterday's journal at 02:13 every night
13 2 * * * root journalctl --since "yesterday" --until "today" --no-pager \
| /usr/local/bin/smallest-logs compress - \
--out /var/log/archive/$(hostname)-$(date -u -d yesterday +\%F).sbz \
&& logger "smallest-logs: shipped $(date -u -d yesterday +\%F)"
Option 2 — Direct curl (no CLI dependency)
#!/usr/bin/env bash
set -euo pipefail
: "${SMALLEST_API_KEY:?must be set}"
SRC=/var/log/syslog.1 # logrotate's previous-day file
[ -f "$SRC" ] || { echo "no rotated syslog yet"; exit 0; }
RESP=$(curl -fsS -X POST "https://smallest.zip/api/log-files/upload?level=max" \
-H "x-api-key: $SMALLEST_API_KEY" \
-F "file=@${SRC}")
echo "$RESP" | jq -r '"shipped \(.uploadId): \(.originalBytes) -> \(.compressedBytes) bytes"'
Option 3 — rsyslog template + omprog
module(load="omprog")
template(name="RawMsg" type="string" string="%msg%\n")
action(type="omprog"
binary="/usr/local/bin/smallest-logs compress - --rotate-out /var/log/archive/rsyslog --rotate-bytes 8388608"
template="RawMsg")
Verification
# Dry-run the nightly cron job right now
SMALLEST_API_KEY=YOUR_API_KEY journalctl --since "1 hour ago" --no-pager \
| smallest-logs compress - --out /tmp/last-hour.sbz
ls -lh /tmp/last-hour.sbz
smallest-logs decompress /tmp/last-hour.sbz --out - | tail
Troubleshooting
| Symptom | Fix |
|---|---|
| Cron job runs but produces 0-byte output | journalctl in cron lacks a TTY — add --no-pager (already in the snippet) and ensure the user has systemd-journal group membership. |
syslog.1 doesn't exist | logrotate hasn't run yet on this host; install logrotate or change the script to read /var/log/syslog directly. |
| rsyslog spawns omprog repeatedly | Add action.execOnlyWhenPreviousIsSuspended="on" and a queue.type="LinkedList" queue. |
SELinux denies omprog exec | chcon -t bin_t /usr/local/bin/smallest-logs or write a targeted policy. |
Disk fills with .sbz archives | Add an upload step that ships the file to S3/R2 then rms the local copy. The CLI's --upload-s3 flag does both atomically. |
Global troubleshooting
| Symptom | Likely cause | Fix |
|---|---|---|
401 Unauthorized | API key wrong or rotated | Check x-api-key header is literal. Rotate at Settings. |
413 Payload Too Large | Upload > 100 MB | Split or stream via the CLI's chunked mode. |
429 Too Many Requests | Plan rate limit hit | Add exponential backoff. Upgrade plan on Pricing. |
| Compression ratio worse than gzip | Wrong codec endpoint | Must be /api/log-files/upload, not /api/upload or another codec. |
| TLS handshake fails | Old curl / OpenSSL on RHEL 7-era boxes | Upgrade curl or use the CLI (statically linked). |
Still stuck? Email support with the uploadId from the last successful response — we can replay the request server-side.