// the find

vortex-data/vortex

★ 3,005 · Rust · Apache-2.0 · updated Jun 2026

An extensible, state-of-the-art framework for columnar compression, and the fastest FOSS columnar file format. Formerly at @spiraldb, now an Incubation Stage project at LFAI&Data, part of the Linux Foundation.

Vortex is a columnar file format and in-memory array library written in Rust, positioned as a faster replacement for Parquet. It separates logical types from physical encodings so you can plug in different compression schemes (RLE, dictionary, ALP for floats, FSST for strings) and still query across them without decoding first. Target audience is people building data systems on object storage who are hitting Parquet's random-access or scan performance ceiling.

The logical/physical separation is the real idea here — you can nest encodings (e.g., run-length over dictionary over bitpacked) and compute kernels operate directly on the encoded representation, so you're not paying a decode-then-compute cost on every predicate. The float compression (ALP/G-ALP) and string compression (FSST) are research-backed and genuinely better than what Parquet offers. Zero-copy Arrow interop means you don't need to rewrite your existing Arrow-based pipeline to adopt it. The benchmarking infrastructure is unusually honest — they ship a bench orchestrator that runs DuckDB and DataFusion against Parquet side-by-side so you can reproduce the claims yourself.

The file format only just hit 'stable' at v0.36.0 — anything before that is a migration risk, and the library APIs are still explicitly versioned as unstable, so adopting this in a production pipeline today means accepting churn. Iceberg integration is listed as 'coming soon,' which is the main on-ramp for most serious data lake users; without it, you're writing custom ingestion. The Spark integration exists but requires a JVM bridge to a Rust native library, which is operationally painful in managed environments (Databricks, EMR) where you can't control the executor classpath. Documentation covers the concepts well but the getting-started path for someone coming from pure Python (not Rust) is rough — the Python bindings are real but the examples assume you already understand the encoding model.

View on GitHub → Homepage ↗