// the find
mrpowers-io/quinn
pyspark methods to enhance developer productivity 📣 👯 🎉
Quinn is a utility library for PySpark that fills gaps in the standard API — schema validation, column renaming, null-safe comparisons, and a handful of missing string/date functions. It targets data engineers who find themselves copy-pasting the same PySpark boilerplate across projects. At 686 stars it's a small but actively maintained niche library.
The schema validation functions (`validate_presence_of_columns`, `validate_schema`) are genuinely useful and something the core PySpark API makes you write yourself every time. `null_between` handles the three-way null logic that trips up most PySpark developers and is easy to get wrong. `print_schema_as_code` is a real time-saver when you need to snapshot a schema from a DataFrame and paste it into code. The benchmark suite with stored JSON results and visualizations shows someone actually cared about the performance characteristics of `column_to_list`.
The function surface is shallow — most of these are one-liners that wrap `F.regexp_replace` or `F.trim`, and any senior PySpark developer will have already written these themselves. The monkey-patching extension model (`quinn/extensions/`) is a code smell in a shared codebase; it mutates global Spark classes and will cause confusion when two libraries do it. `show_output_to_df` parsing log-format output strings into DataFrames is an antipattern that belongs in a test helper, not a production library. No Spark Connect support is mentioned anywhere, which matters as Databricks pushes users toward serverless.