finds.dev← search

// the find

projectnessie/nessie

★ 1,467 · Java · Apache-2.0 · updated Jun 2026

Nessie: Transactional Catalog for Data Lakes with Git-like semantics

Nessie is a metadata catalog for data lakes that brings Git-style branching and versioning to Iceberg tables. You create branches, commit table changes, merge, and tag — same mental model as source control, but for your data. It's aimed at teams running Spark, Flink, or Trino who want to experiment on data without affecting production reads.

The Git semantics are genuinely well-implemented: you get real branch isolation, atomic commits across multiple tables in a single transaction, and merge/diff operations that make sense for catalog operations. The compatibility matrix is thorough — Spark 3.3/3.4/3.5, Flink 1.16-1.18, Trino, Presto all tested and documented. The OAuth2 client implementation covers every grant type including device code flow and token exchange, which is more than most catalog projects bother with. The Quarkus-based server starts fast and ships as a small Docker image, which matters when you're spinning it up in CI for integration tests.

Hive support is listed as n/a in the compatibility table with no explanation — if you're on a Hive-heavy shop, you're on your own. The README points you to external docs for almost everything real (spark integration, configuration reference, CLI usage), so getting started requires bouncing between multiple sites rather than having a working example in front of you. The 555MB Docker image size is listed right in their own output, which is not small for what is essentially a catalog service. v1 API still exists and is presumably maintained alongside v2, which means two surfaces to support and a question of when v1 actually gets dropped.

View on GitHub → Homepage ↗

// want more like this?

We dig through GitHub every week and send a few repos picked for what you actually care about — each with an honest take like this one.

Get finds in your inbox → Search again →