// the find
andygrove/datafusion-archive
DataFusion has now been donated to the Apache Arrow project
This is the original pre-Apache DataFusion repo — a Rust query engine built on Apache Arrow that was donated to the Arrow project in February 2019. It's a historical artifact, not an active project. If you're looking for the actual DataFusion, it lives under apache/arrow-datafusion (now apache/datafusion).
- The logical/physical plan split and execution model are cleanly separated for a project this early — src/logicalplan.rs and src/sqlplanner.rs are good reference material for understanding how a query engine is bootstrapped from scratch
- Arrow as the in-memory columnar format was the right call in 2018 and the code shows what minimal-viable integration with Arrow looks like before all the ergonomic wrappers existed
- The test suite uses expected CSV output files for query results — low-tech but readable and easy to debug when something breaks
- Dead project — last push was February 2019, the README itself tells you to go elsewhere. There is nothing to adopt here.
- Single-threaded only, no real distributed compute despite the name; the 'modern distributed compute platform' framing in the README never materialized before donation
- Requires Rust nightly due to parquet-rs — in 2019 that was painful; the dependency chain would be completely broken today
- No join support, no subqueries, no window functions — the SQL surface area is too thin to be useful for anything beyond toy queries against CSV files