finds.dev← search

// the find

treeverse/lakeFS

★ 5,422 · Go · Apache-2.0 · updated Jun 2026

lakeFS - Data version control for your data lake | Git for data

lakeFS puts a Git-like branching layer in front of S3, GCS, or Azure Blob without copying data. You get branches, commits, merges, and rollbacks on your data lake, with an S3-compatible API so Spark, Athena, DuckDB, and friends connect without code changes. It's for data engineering teams tired of "just don't touch prod" as their data quality strategy.

The copy-on-write branching model is the core win: creating a branch of a petabyte-scale lake is essentially free because it only tracks deltas. The S3-compatible API means existing pipelines point at a new endpoint and work unchanged — no SDK swap required. The write-audit-publish pattern (branch → run quality checks via hooks → merge to main) is a real workflow improvement over ad-hoc ETL testing. Client ecosystem is thorough: Hadoop FileSystem implementation, Python wrapper, Java SDK, and a Go CLI (lakectl) all ship from the same repo.

Garbage collection is operationally annoying — you run a separate Spark job to reclaim physical storage from deleted objects, which means your storage costs don't drop until you schedule and run that job. The metadata store (PostgreSQL or DynamoDB) becomes a single point of failure; if it goes down, your data lake is effectively read-only even though the underlying objects are fine. The README's AI code-sharing legal notice is a red flag for enterprise adoption teams — Treeverse explicitly threatens legal action over AI tooling use of their Apache 2.0 code, which creates real ambiguity for shops using Copilot or Cursor on integration work. Multi-repo workflows (data that spans repositories) have no native merge strategy, so cross-repo consistency is left entirely to the user.

View on GitHub → Homepage ↗

// want more like this?

We dig through GitHub every week and send a few repos picked for what you actually care about — each with an honest take like this one.

Get finds in your inbox → Search again →