// the find
torchgeo/torchgeo
TorchGeo: datasets, samplers, transforms, and pre-trained models for geospatial data
TorchGeo is a PyTorch domain library for geospatial and remote sensing ML, covering dataset loading, geographic samplers, transforms, and pre-trained models across dozens of real-world datasets. It fills a genuine gap: working with satellite imagery in PyTorch used to mean writing CRS-reprojection and patch-sampling boilerplate by hand. This is for researchers and practitioners doing earth observation, crop mapping, change detection, or any ML task where the input is georeferenced raster data.
The geospatial dataset composition model (union/intersection operators with automatic CRS alignment) is genuinely well-designed — it handles the messy reality of multi-sensor data without requiring the user to preprocess everything to a common grid. Pre-trained weights on multispectral sensors (Sentinel-2, Landsat) rather than just ImageNet RGB are a concrete practical advantage for transfer learning on satellite imagery. The dataset catalog is enormous — 100+ datasets with download, checksum, and extraction handled consistently. Lightning integration with pre-defined train/val/test splits means you can get a reproducible baseline experiment running in under 20 lines of code.
The 'download everything automatically' pattern is fine for benchmarks but scales badly in practice — satellite datasets are often hundreds of GB, and the library doesn't give you much control over partial or streaming downloads. The geospatial sampler API assumes you have all your data locally on disk; there's no first-class support for cloud-native formats like COG or STAC catalogs, which is how most production geospatial pipelines actually work. The pre-trained weight selection is still sparse relative to what's available in the broader geospatial FM space — many of the newer foundation models are wrapped but the weight coverage is uneven. Finally, the dependency footprint is heavy: GDAL, rasterio, and shapely mean a non-trivial environment setup that conda/pip doesn't always resolve cleanly, especially on Windows.