// the find

paradedb/pg_analytics

★ 537 · Rust · PostgreSQL · updated Mar 2025

DuckDB-powered data lake analytics from Postgres

pg_analytics embeds DuckDB inside Postgres as an extension, letting you query S3, GCS, Azure Blob, Delta Lake, Iceberg, and Parquet files directly from SQL without moving data. It uses Postgres's FDW and executor hook APIs to push queries down to DuckDB. The repo is archived — ParadeDB has folded this work into their main pg_search extension.

The executor hook approach is architecturally clever: rather than a naive FDW that pulls data into Postgres row-by-row, it intercepts the query planner and hands the whole query to DuckDB, which is why it's actually fast over millions of rows. Format coverage is broad — Parquet, Delta, Iceberg, CSV, JSON, GeoJSON, and XLSX all from the same extension. Auto schema detection from Parquet metadata means you don't have to manually define column types. The test suite uses testcontainers with LocalStack for real S3 semantics rather than mocking, which is the right call.

The project is discontinued — the README says so in the first paragraph, which makes recommending it for new work a non-starter. Write support was never shipped; it's read-only, so you can't use this to replace a lakehouse write path. The executor hook is a global Postgres hook, meaning it intercepts all queries and adds overhead even on OLTP traffic — not something you'd want on a shared production instance. Windows is unsupported due to pgrx limitations, which rules out a nontrivial portion of dev environments.

View on GitHub → Homepage ↗