// the find
opengeospatial/geoparquet
Specification for storing geospatial vector data (point, line, polygon) in Parquet
GeoParquet is an OGC specification for encoding geospatial vector data (points, lines, polygons) in Apache Parquet files. It's aimed at data engineers and GIS developers who want to move spatial data through columnar analytics pipelines — BigQuery, DuckDB, GeoPandas, Arrow — without converting to Shapefile or GeoJSON first. Version 1.1.0 is stable and community-approved; 2.0 is in development, aligning with native geometry types now baked into the Parquet format spec itself.
Parquet is already the lingua franca of analytical data; adding a standard geometry encoding on top means spatial data plugs directly into DuckDB, Spark, and cloud warehouses without a conversion step. The spec is narrow and well-scoped — it defines metadata layout and encoding options (WKB by default, native Parquet geometry types in 2.0) without trying to reinvent a full spatial format. Ecosystem traction is real: 20+ implementations across 6 languages and backing from GeoPandas, GDAL, Microsoft, and Planet means this isn't vaporware. The alignment with GeoArrow for in-memory representation and Parquet for persistence is a genuinely clean separation of concerns.
The spec is still pending OGC formal approval despite 1.1.0 being 'community agreed', which matters if your organization needs a standards checkbox for procurement. The 2.0 transition to native Parquet geometry types will create a flag day for implementations — tools that handle 1.x WKB may not handle 2.0 geometry columns without updates, and the migration story isn't spelled out yet. Geospatial partitioning (spatial indexing within Parquet files) is listed as a goal but not fully landed, so range-filtered spatial queries still require reading more data than you'd want. Write-heavy workloads are explicitly out of scope, which is fine but means you'll still need PostGIS or a row store alongside it for anything transactional.