// the find
pola-rs/polars
Extremely fast Query Engine for DataFrames, written in Rust
Polars is a DataFrame query engine written in Rust with first-class Python, R, Node.js, and SQL frontends. It's aimed at data engineers and analysts who've hit pandas' performance ceiling or need to process datasets larger than RAM. It's a genuine pandas alternative, not just a wrapper.
- Performance is real and measurable: SIMD, multi-threaded execution, and a lazy query optimizer that rewrites plans before execution. The PDS-H benchmarks back up the claims, and the 70ms import time vs pandas' 520ms is a day-to-day quality-of-life win.
- The expression API is well-designed and composable — operations like `col('a').str.split(',').explode().over('group')` chain naturally without the index alignment footguns that plague pandas.
- Streaming execution for out-of-core data is built into the engine, not bolted on. Passing `engine='streaming'` to `collect()` is all you need to process datasets that don't fit in RAM.
- The codebase is well-structured with clean separation between the Arrow layer (polars-arrow), the plan/optimizer (polars-plan), and execution (polars-ops). Custom Rust UDFs via pyo3-polars are a first-class path, not a hack.
- The Rust API is considerably less ergonomic and less documented than the Python API. If you're using Polars from Rust directly, expect to spend time reading source code for things that are obvious in Python.
- Streaming mode has meaningful feature gaps — not all operations are supported, and hitting an unsupported op silently falls back to in-memory execution. This can catch you off-guard when processing that 250GB dataset.
- Breaking changes between minor versions have been frequent. Projects that pinned to 0.19 and updated to 0.20+ rewrote significant amounts of pipeline code due to API renames and behavioral changes.
- The ecosystem around Polars (connectors, integrations, ML pipeline compatibility) is still catching up to pandas. Libraries that accept DataFrames often expect pandas specifically, requiring `.to_pandas()` conversions that erase the performance gains.