// the find
drivendataorg/cloudpathlib
Python pathlib-style classes for cloud storage services such as Amazon S3, Azure Blob Storage, and Google Cloud Storage.
cloudpathlib wraps S3, GCS, and Azure Blob Storage behind the standard pathlib.Path interface, so code that already uses Path can switch to cloud storage by changing one import. It's aimed at data scientists and ML engineers who want to read/write cloud files without learning three different SDKs. The local mock implementations make it genuinely useful for testing, not just production.
The local filesystem mock (LocalS3Path, etc.) is the best part — you can write tests that run entirely on disk with no cloud credentials and no monkey-patching. The caching layer is transparent and configurable: you can point it at a persistent directory and avoid re-downloading files across runs. Extensibility is real: two classes (MyPath and MyClient) is genuinely all you need for a new backend, the base class handles the rest. The HttpsPath implementation is a nice bonus that makes the same interface work for public HTTP URLs.
621 stars is modest for a library solving a problem this common — fsspec has eaten most of this space and integrates with pandas, pyarrow, and dask natively, which cloudpathlib does not. The caching model downloads entire files locally before you can read them, so it falls apart for large files where you want range reads or streaming. There's no async support — every operation is blocking, which is a real problem if you're doing this in a web server or async pipeline. The library also inherits pathlib's assumption that paths are cheap to construct and stat, which cloud storage punishes with latency on every .exists() call.