// the find
nubskr/walrus
🦭 Distributed log streaming engine built from first principles
Walrus is a distributed log streaming engine written in Rust, using Raft consensus for metadata coordination and segment-based partitioning for load distribution. It positions itself as a Kafka alternative built from first principles, with an embedded Raft implementation (Octopii, which wraps openraft) and io_uring support on Linux. This is a learning/research project that has grown into something genuinely functional — interesting for developers who want to understand how streaming systems work at the storage layer.
The separation of consensus from data is architecturally correct — Raft only coordinates metadata (topic→segment→leader mappings), not the actual write path, which avoids the main performance bottleneck of Raft-for-everything designs. The TLA+ spec for the distributed data plane is a serious artifact; most projects at this star count don't bother, and the verified invariants (single-writer per segment, sealed segment immutability) map directly to the implementation. The io_uring storage backend for Linux is a real differentiator — the benchmark numbers are plausible given io_uring's ability to batch syscalls, and the fallback mmap path means it still runs elsewhere. The test suite using actual Docker-based cluster scenarios (resilience, rollover, recovery) is more honest than unit tests with mocked nodes.
The no-fsync benchmark comparison is misleading: Walrus at 1.2M writes/s vs Kafka at 1.1M writes/s is comparing a single-node local engine against a full Kafka broker (JVM, network, ZooKeeper/KRaft overhead) — not apples to apples. The fsync numbers tell the real story and they're all identical, which is what you'd expect. There's no consumer group semantics — GET uses a shared cursor per topic, meaning you can't run multiple independent consumers on the same topic without coordination that isn't there. The lease sync loop runs every 100ms, which means there's a window where a newly-elected leader could accept writes before old-leader leases expire — the TLA+ spec abstracts this away rather than modeling it. At v0.3.0 with 72 forks, this is not production-ready and the README doesn't claim it is, but someone adopting it for anything real would hit the single-writer-per-segment constraint hard under mixed workloads.