// the find
dathere/qsv
Blazing-fast Data-Wrangling toolkit
qsv is a Rust-based CLI toolkit for wrangling tabular data — CSV, Excel, Parquet, JSON, Arrow — with 70+ composable commands covering everything from basic slicing to statistical profiling, schema inference, SQL queries via Polars, and LLM-assisted metadata generation. It targets data engineers and open data practitioners who process large files on the command line and need real speed without spinning up Python or Spark. The CKAN integration and FAIR metadata tooling make it particularly relevant to the open government data world.
The index system is the architectural standout: one `qsv index` command on your CSV unlocks constant-time random access, enables multithreading in stats/frequency/split/schema, and makes slice/count/sample instantaneous. This is a real design win, not a bolt-on. The external sort and dedup commands (`extsort`, `extdedup`) handle arbitrarily large files via external merge sort and an on-disk hash table — you're not trapped by RAM. Embedding Luau (Roblox's typed Lua) as the scripting DSL is a better call than Python for this use case: it's sandboxable, predictable, and the BEGIN/MAIN/END section model maps cleanly to per-row pipelines with lookup tables. The Polars-backed commands (`sqlp`, `joinp`, `pivotp`) process larger-than-memory files with real LazyFrame query planning — this isn't just a wrapper, it's actually using Polars correctly.
The binary variant system is a trap for newcomers: qsv, qsvlite, qsvdp, qsvmcp, plus feature flags — if you download the wrong prebuilt you silently lose commands marked ✨, and the README buries the variant matrix. At 70+ commands with LLM integration, geocoding, CKAN catalog support, an MCP server, a Claude Code plugin, and a GUI (qsv pro), this has well past the point where one README can orient a new user; the documentation sprawl is real and growing. MSRV is Rust 1.96 — very recent, which matters if you want to embed qsv as a library in a project pinned to an older toolchain. The star-to-fork ratio (3675:103) signals that almost nobody is contributing, which combined with what looks like a single primary maintainer is a meaningful bus factor risk for anything you'd depend on in production.