// the find

bruin-data/bruin

★ 1,628 · Go · Apache-2.0 · updated Jun 2026

Build data pipelines with SQL and Python, ingest data from different sources, add quality checks, and build end-to-end flows.

Bruin is a CLI-first data pipeline tool that tries to collapse ingestion, SQL/Python transformation, and data quality checks into a single binary. It targets small-to-medium data teams who want dbt-style SQL workflows without the full Modern Data Stack complexity. Written in Go, so the binary is fast and easy to distribute.

The ingestr integration is the standout feature — 100+ sources out of the box without writing custom connectors is genuinely useful. Using uv for isolated Python environments is a smart call; it avoids the usual virtualenv mess that bites data teams. The dry-run validation that checks the full pipeline end-to-end before execution catches dependency errors early, which most competitors don't do. The VS Code extension with lineage visualization means you're not stuck reading YAML to understand what runs before what.

1628 stars and 81 forks is a thin community for a tool asking to own your entire data pipeline — if you hit an edge case with a specific source, you're probably filing an issue and waiting. The cloud product (bruin cloud) is a separate commercial layer that's not open source, which means the OSS version is essentially a marketing funnel; features like audit logs, governance, and scheduling live behind a paywall. The dependency on ingestr for ingestion means you're trusting a secondary project's stability for the most critical pipeline step. R support is listed but clearly an afterthought — the docs are sparse and it's unlikely to be well-tested.

View on GitHub → Homepage ↗