finds.dev← search

// the find

chdb-io/chdb

★ 2,693 · Python · Apache-2.0 · updated Jun 2026

chDB is an in-process OLAP SQL Engine 🚀 powered by ClickHouse

chDB embeds the ClickHouse OLAP engine directly into a Python process — no server, no daemon, just `pip install chdb` and you get columnar SQL on Parquet/CSV/Arrow files with ClickHouse's full query engine. It also ships a DataStore layer that translates pandas-style method chains into SQL, positioning it as a drop-in replacement for pandas on large datasets. Target audience is data scientists and analysts who hit pandas memory walls but don't want to spin up infrastructure.

ClickHouse's actual query engine in-process means the benchmark numbers are real — it's not a toy SQL layer, it's the full columnar engine with proper aggregations, JOINs, and 60+ file formats. The memoryview approach for Python↔C++ data transfer is genuinely clever and avoids the copy overhead that kills most Python/native interop. The DataStore lazy evaluation — recording operations and compiling to SQL only at materialization — is the right architecture for this problem, not a bolted-on afterthought. The `Python(df)` table function that lets you mix arbitrary pandas DataFrames with SQL mid-query is useful for the hybrid workflows data scientists actually have.

The DataStore 'drop-in pandas replacement' claim is optimistic — 209 DataFrame methods sounds comprehensive until you hit the ones missing in a real notebook, and the fallback-to-pandas behavior means you can't trust that your code stays fast without auditing every operation. Windows support is absent (macOS and Linux only), which will surprise anyone on a Windows data team. The UDF story is genuinely weak: stateless only, all arguments arrive as strings, no UDAFs — ClickHouse's native UDF system is more capable, so you're taking a real regression if you relied on pandas' apply. The DataStore test suite has 100+ `test_exploratory_batchN` files that read like auto-generated coverage padding rather than specifications, which makes it hard to trust what's actually guaranteed behavior versus accidental behavior.

View on GitHub → Homepage ↗

// want more like this?

We dig through GitHub every week and send a few repos picked for what you actually care about — each with an honest take like this one.

Get finds in your inbox → Search again →