finds.dev← search

// the find

Basekick-Labs/arc

★ 607 · Go · AGPL-3.0 · updated Jun 2026

High-performance analytical database. 19.9M records/sec ingestion, 8.4M+ rows/sec queries. Ingestion, compaction, SQL, retention, continuous queries — one binary. Open Parquet on your storage. S3/Azure native. Air-gap ready. No vendor lock-in. AGPL-3.0.

Arc is a columnar analytical database built in Go that wraps DuckDB with a high-throughput write path, background compaction, retention policies, and MQTT ingestion — all in one statically-linked binary. It targets the gap between 'run ClickHouse' (operationally heavy) and 'just use SQLite' (analytically weak), aiming at observability, IoT telemetry, and edge deployments where you want sub-millisecond ingestion without a cluster to babysit. The open Parquet storage format means your data isn't hostage to Arc if you walk away.

- Single binary with no runtime dependencies is genuinely useful for edge and air-gap deployments — no JVM, no ZooKeeper, no sidecar processes. The compaction system (97.7% file count reduction, 90% size reduction in their benchmarks) is the kind of operational detail that distinguishes a real storage engine from a weekend project.

- The Arrow IPC response path is a smart choice: zero-copy from DuckDB's internal column buffers to the wire, giving Grafana and pyarrow clients 3x the throughput of JSON at 1M-row payloads. The benchmark table is honest about where wire format doesn't matter — aggregation queries returning a few rows converge across all three formats.

- Parquet as the storage layer is the right call. You can point DuckDB, Polars, or Spark at the same files independently. Vendor lock-in via proprietary binary formats is how observability vendors extract rent; Arc sidesteps that entirely.

- The internal package structure shows serious engineering: Raft consensus in internal/cluster, per-token query quotas in internal/governance, WAL, circuit breakers, schema evolution tests, and race condition tests in the ingest layer. This isn't glue code with a performance badge slapped on.

- The 19.9M records/sec number is benchmarked on an M3 Max with 12 concurrent workers writing 1000-record batches in columnar MessagePack — that's measuring Arc's in-memory buffer flush, not durable writes. DuckDB is not designed for high-concurrency concurrent writes; the ingestion speed is real, but the durability story (WAL is optional) needs scrutiny before you bet production telemetry on it.

- AGPL-3.0 is a hard no for most commercial SaaS use cases. Running Arc as a service while modifying it requires open-sourcing those modifications. The enterprise license contact exists but has no public pricing — you're negotiating blind. The internal/license package suggests non-trivial feature gating behind that paywall.

- 607 stars and a February 2026 first release means this has essentially zero production validation outside the maintainers. The self-reported benchmarks have no independent reproduction, and the contributor list is three people. 'Edge Sync' (spoke-to-hub replication) is listed as coming in 26.09.1 — a critical feature for disconnected operations that isn't there yet.

- The enterprise Helm chart has separate writer, reader, and compactor StatefulSets — the single-binary pitch holds for small deployments but breaks apart under load in ways the README doesn't prepare you for. The MessagePack columnar query endpoint, one of the headline features, is still behind a build tag and explicitly experimental.

View on GitHub → Homepage ↗

// want more like this?

We dig through GitHub every week and send a few repos picked for what you actually care about — each with an honest take like this one.

Get finds in your inbox → Search again →