// the find

prestodb/presto

★ 16,711 · Java · Apache-2.0 · updated Jun 2026

The official home of the Presto distributed SQL query engine for big data

Presto is Meta's distributed SQL query engine, designed to run analytical queries across heterogeneous data sources — Hive, HDFS, S3, Kafka, Accumulo, and more — without moving data. It's a serious piece of infrastructure used at petabyte scale, not a weekend project. The target audience is data platform teams who need federated SQL across a data lake.

The connector architecture is genuinely well-designed: each data source gets its own plugin implementing a clean SPI, so adding a new source doesn't touch the query engine core. The C++ native worker (presto-native-execution / Velox) is a real investment — offloading execution to Velox gets you SIMD-accelerated vectorized processing that the JVM can't match. The CI pipeline is thorough: separate workflow files per connector (Hive, Kudu, SingleStore, Arrow Flight) with product tests against real environments, not just unit mocks. Active development as of today with 5,500+ forks and a clear governance structure under the Linux Foundation.

The build is a nightmare to get running locally — it needs a Hive metastore just to do development, the full Maven build takes forever, and the JDK 17 setup requires a pile of --add-opens flags that signal the codebase is still fighting the module system rather than embracing it. The Java/C++ split (coordinator in Java, optional native workers in C++) means you're running two completely different runtimes and debugging across that boundary when things go wrong. The coordinator is a single point of failure — no coordinator HA out of the box, so a coordinator restart kills all running queries. Documentation quality is uneven: connector docs vary wildly, and operational guidance (tuning memory configs, resource groups, OOM behavior) is scattered and often outdated.

View on GitHub → Homepage ↗