finds.dev← search

// the find

getdozer/dozer

★ 1,577 · Rust · AGPL-3.0 · updated Jun 2024

Dozer is a real-time data movement tool that leverages CDC from various sources and moves data into various sinks.

Dozer is a CDC-based data movement tool written in Rust that streams changes from sources like Postgres, MySQL, and Snowflake into sinks like ClickHouse and BigQuery. It targets the Debezium+Kafka use case but as a single binary with built-in transformations. Aimed at teams who want real-time warehouse sync without running a Kafka cluster.

The Rust implementation gives it genuine throughput advantages over JVM-based Debezium — the benchmarks aren't marketing fluff, the architecture supports it. Supporting both CDC resumption and stateless SQL transformations in one tool removes a whole class of pipeline middleware. The connector matrix is wide for a project this size: Postgres WAL, MySQL binlog, Snowflake streams, S3, GCS, Kafka, and Delta Lake all in one binary. The embedded Deno runtime for JavaScript transforms is a smart escape hatch when SQL isn't enough.

The README points to a Rust source file as the 'full documentation' — that's a red flag for production adoption. Kafka resuming is still marked 🚧 and MongoDB/S3/GCS resumption is 🎯 (planned), meaning you can't safely use those sources for anything that can't tolerate data loss on restart. The last commit was June 2024 and the repo shows no recent activity — this looks effectively unmaintained, which is a serious risk for infrastructure-level software. Oracle and Aerospike connectors being enterprise-only with no public pricing or docs is a dead end for evaluation.

View on GitHub → Homepage ↗

// want more like this?

We dig through GitHub every week and send a few repos picked for what you actually care about — each with an honest take like this one.

Get finds in your inbox → Search again →