// the find

apache/doris

★ 15,460 · Java · Apache-2.0 · updated Jun 2026

Apache Doris is an easy-to-use, high performance and unified analytics database.

Apache Doris is a distributed OLAP database built on MPP architecture, designed for real-time analytics at scale. It sits in the same space as ClickHouse and StarRocks (StarRocks is actually a fork of Doris), targeting data warehouse workloads where you need sub-second query response on billions of rows. Production deployments at Baidu, ByteDance, and Tencent suggest it handles serious scale.

Genuinely good two-process architecture (FE + BE only) — no ZooKeeper, no Kafka dependency just to run the thing. The vectorized query engine with SIMD acceleration and the Pipeline execution engine that avoids thread explosion are real engineering, not marketing. MySQL protocol compatibility means zero friction with existing tooling — connect with any MySQL client, any BI tool that speaks MySQL. Federated query support across Hive, Iceberg, Hudi, and Delta Lake without data movement is actually useful for teams stuck managing a lake and a warehouse separately.

The backend is split-brain C++/Java — FE is Java, BE is C++, which means two different build systems, two different debugging environments, and a non-trivial operational surface. Building from source requires Docker and takes hours; this is not a weekend experiment. The storage-compute integrated architecture that makes it 'easy to maintain' also means scaling storage and compute independently requires jumping to the separate 'cloud' mode, which is a meaningfully different codebase. The Chinese-company origin means a lot of docs and issues are in Chinese first, English second — you will hit walls if you go off the happy path.

View on GitHub → Homepage ↗