finds.dev← search

// the find

andygrove/datafusion-archive

★ 628 · Rust · Apache-2.0 · updated Feb 2019

DataFusion has now been donated to the Apache Arrow project

This is the original pre-Apache DataFusion repo — a Rust query engine built on Apache Arrow that was donated to the Arrow project in February 2019. It's a historical artifact, not an active project. If you're looking for the actual DataFusion, it lives under apache/arrow-datafusion (now apache/datafusion).

- The logical/physical plan split and execution model are cleanly separated for a project this early — src/logicalplan.rs and src/sqlplanner.rs are good reference material for understanding how a query engine is bootstrapped from scratch

- Arrow as the in-memory columnar format was the right call in 2018 and the code shows what minimal-viable integration with Arrow looks like before all the ergonomic wrappers existed

- The test suite uses expected CSV output files for query results — low-tech but readable and easy to debug when something breaks

- Dead project — last push was February 2019, the README itself tells you to go elsewhere. There is nothing to adopt here.

- Single-threaded only, no real distributed compute despite the name; the 'modern distributed compute platform' framing in the README never materialized before donation

- Requires Rust nightly due to parquet-rs — in 2019 that was painful; the dependency chain would be completely broken today

- No join support, no subqueries, no window functions — the SQL surface area is too thin to be useful for anything beyond toy queries against CSV files

View on GitHub → Homepage ↗

// want more like this?

We dig through GitHub every week and send a few repos picked for what you actually care about — each with an honest take like this one.

Get finds in your inbox → Search again →