// the find
trinodb/trino
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
Trino is a distributed SQL query engine that runs queries across heterogeneous data sources — S3, Hive, Iceberg, Delta Lake, RDBMS — without moving data. It's the engine you reach for when you need ad-hoc analytics at petabyte scale and can't afford to ETL everything into one place first. Successor to PrestoSQL, with a large active contributor base and production deployments at major tech companies.
The connector architecture is genuinely well-designed: adding a new data source means implementing a clean SPI, not forking the core. The spooling protocol (see client/trino-client/spooling/) is a smart addition — large result sets stream out via object storage instead of overwhelming the coordinator. Active release cadence (roughly weekly), reproducible builds since v449, and a CI pipeline that runs connector-specific test matrices rather than one monolithic suite. The CLAUDE.md in the repo root is an unusual touch — they've embedded AI-assisted contribution rules directly in the tree.
Operationally heavy: you need a coordinator, multiple workers, and a metastore before you get a single query. Not a weekend project to self-host. The Java heap tuning story is still painful — OOM errors from spill misconfiguration are a common production failure mode. Building requires Docker and JDK 25 with incubator vector modules, which breaks on Apple Silicon without Rosetta; that's a friction point for new contributors that the README acknowledges but doesn't fully resolve. Documentation for writing custom connectors lags behind the SPI itself — the SPI changes, the docs don't.