// the find

aws/aws-sdk-pandas

★ 4,107 · Python · Apache-2.0 · updated Jun 2026

pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).

awswrangler (now 'AWS SDK for pandas') is a Python library that wraps boto3 and pandas to make reading/writing data across AWS services feel like local DataFrame operations. It's primarily for data engineers and analysts who live in the AWS ecosystem and want to avoid writing boilerplate S3/Athena/Redshift plumbing. Actively maintained by AWS Professional Services, not a community project.

The Athena integration is genuinely well-done — query caching, partition projection support, and schema evolution handling are things you'd otherwise spend weeks building yourself. The Redshift COPY/UNLOAD path uses S3 staging automatically, which is the correct way to bulk-load Redshift and easy to get wrong manually. Optional extras via pip install extras (e.g., `awswrangler[redshift]`) means Lambda deployment packages don't bloat with dependencies you don't use. The Ray/Modin scale-out path is a real escape hatch: same API, distributed execution, without rewriting your pipelines.

The breadth is also the liability — 15+ services in one package means you're constantly pulling in a large dependency tree and the surface area for breaking changes is huge. DynamoDB support is shallow: it'll scan a table into a DataFrame, but if you need complex query patterns or consistent reads at scale, you'll hit its limits fast. The distributed Ray path has a separate set of supported APIs that don't fully overlap with the single-node path, which will surprise you mid-migration. Dependency on Glue Catalog as the metadata layer is baked in fairly deeply — if you're not using Glue, several features (partition management, schema evolution) become significantly less useful.

View on GitHub → Homepage ↗