// the find
apache/iceberg-python
PyIceberg
PyIceberg is the official Python client for Apache Iceberg, the open table format for large analytic datasets. It lets you read and write Iceberg tables directly from Python without going through Spark or Trino — useful for data engineering pipelines that don't want a JVM in the critical path.
Catalog support is broad: REST, Glue, Hive, DynamoDB, BigQuery Metastore, and SQL all ship as first-class options, not afterthoughts. The Avro reader has a Cython fast path (`decoder_fast.pyx`) so manifest scanning isn't purely Python-slow. Expression DSL and row filter syntax are properly implemented with pushdown visitors, not just post-filter in memory. Active development with 509 forks and a push two days ago — this isn't abandoned Apache incubator shelf-ware.
Write support is still catching up to the Java implementation; complex merge-on-read delete handling and full upsert semantics are incomplete enough that the repo ships a `upsert_util.py` as a separate utility rather than a first-class table operation. The dependency footprint is heavy — you're pulling in PyArrow, fsspec, and optionally Cython just to touch a catalog. No support for reading directly to pandas without going through Arrow. Integration test setup requires Docker Compose with Spark and Hive images, making local contribution setup a half-day project.