// the find
fugue-project/fugue
A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars code on Spark, Dask and Ray without any rewrites.
Fugue is an abstraction layer that lets you write data transformation logic once in plain Python/Pandas and run it on Spark, Dask, or Ray by swapping an engine parameter. It also ships FugueSQL, an extended SQL dialect that can call Python functions inline. Primarily useful for data engineers and ML practitioners who want to test locally on Pandas and deploy to Spark without maintaining two codebases.
The transform() function is genuinely well-designed — it handles the boilerplate of partitioning, schema declaration, and iterator wrapping that makes raw PySpark UDFs painful to write. The engine_context pattern is clean: one code path, multiple runtimes, no conditionals scattered through your logic. FugueSQL's TRANSFORM...USING syntax is a real idea — mixing declarative SQL with procedural Python functions without an ORM is useful for ETL pipelines. The test infrastructure (fugue_test suite with backend-agnostic suites) means third-party backends can prove compliance against the same test cases.
The schema='*' requirement leaks distributed-computing concerns into functions that are supposed to be engine-agnostic — you're not really writing plain Pandas code if you have to think about schema annotations for Fugue's benefit. Polars support is listed as 'DataFrames only', which is a significant gap given that Polars is increasingly the reason people move away from Pandas in the first place. 2,165 stars and 100 forks for a project this mature suggests adoption stalled; Ibis has eaten much of this problem space with better SQL backend support and stronger momentum. The repo activity is real but community is thin — the Slack is the primary support channel and conference talks are from 2021-2022.