// the find

apache/kyuubi

★ 2,347 · Scala · Apache-2.0 · updated Jun 2026

Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.

Kyuubi is a Thrift JDBC/ODBC gateway that sits in front of Spark (and experimentally Flink/Trino) and adds proper multi-tenancy on top — something Spark's own Thrift Server never got right. The target audience is data platform teams running Hadoop/Kubernetes clusters where multiple business units need isolated Spark engines without each spinning up their own cluster.

The core architectural insight is solid: decouple the gateway process from the engine process so a server crash doesn't kill running queries, and spin up per-user or per-group Spark engines rather than sharing one. The HiveServer2-compatible interface means existing BI tools (DBeaver, Tableau, Power BI) connect without drivers changes. The Kubernetes engine-on-demand support is genuinely useful — Spark pods spin up per session and die when idle, which controls cost in a way the old YARN-always-running model doesn't. The playground Docker Compose environment with preconfigured Hadoop, HMS, Spark, Prometheus, and Grafana actually works for local evaluation, which most projects of this complexity don't bother with.

The Spark coupling is a real constraint: everything interesting in Kyuubi assumes Spark, and while Flink/Trino/Hive support exists, it's clearly secondary. If your stack doesn't include Spark you're not the audience. The configuration surface is enormous — kyuubi-defaults.conf has hundreds of properties and there's no opinionated 'start here' profile; getting multi-tenancy right with Kerberos, YARN ACLs, and Ranger policies together requires expertise that the docs assume you already have. The engine-per-user model means cold-start latency on every new session unless you pre-warm engines, and the docs underplay how painful this is in interactive BI scenarios where users expect sub-second response. Star count (2.3k) is modest for something Apache-graduated, suggesting the adoption ceiling is the Spark-centric enterprise data platform world, not a general audience.

View on GitHub → Homepage ↗