// the find
ArroyoSystems/arroyo
Distributed stream processing engine in Rust
Arroyo is a distributed stream processing engine that lets you run SQL against real-time data streams — Kafka, Kinesis, HTTP, filesystem sinks — with stateful operations like windows and joins. It ships as a single binary, runs locally or on Kubernetes, and is now the engine behind Cloudflare Pipelines. Target audience is teams who want Flink-level capability without the JVM tax or the operational complexity.
Single binary deployment is genuinely useful — no Zookeeper, no separate coordinator process to babysit. SQL-first design with DataFusion under the hood means the query planner is battle-tested, not home-rolled. Connector breadth is solid: Kafka, Kinesis, Iceberg, Delta Lake, MQTT, NATS, RabbitMQ, Redis — enough to cover most real-world pipelines without building adapters. State checkpointing with epoch-based recovery means pipelines survive restarts without replaying from the beginning.
The Cloudflare Pipelines tie-in is a double-edged sword: it signals commercial backing, but it also means the open-source roadmap will drift toward what Cloudflare needs, not what self-hosters need. Python and Java UDF support is absent — you write UDFs in Rust or not at all, which is a real barrier for data teams. Kubernetes scheduler is there but the docs around production topology (how many nodes, memory sizing for large state) are thin. Commit activity looks healthy now but the project has had quiet stretches; at ~5k stars it's still small enough that a pivot or acquisition could stall it.