finds.dev← search

// the find

blockchain-etl/ethereum-etl

★ 3,134 · Python · MIT · updated Jan 2026

Python scripts for ETL (extract, transform and load) jobs for Ethereum blocks, transactions, ERC20 / ERC721 tokens, transfers, receipts, logs, contracts, internal transactions. Data is available in Google BigQuery https://goo.gl/oY5BCQ

ethereum-etl is a Python CLI for extracting Ethereum blockchain data (blocks, transactions, logs, traces, token transfers, contracts) into CSVs or streaming to Kafka/Pub-Sub/Postgres. It's the de facto standard for analysts who want to query Ethereum data in BigQuery or build their own pipeline without writing Web3 code. Google uses it to power the public BigQuery Ethereum dataset.

1. Breadth of coverage is genuinely good — blocks, receipts, logs, internal traces (both Geth and Parity/OpenEthereum styles), ERC20/ERC721 transfers, and contract bytecode, all with consistent CSV schemas documented in one place. 2. The exporter abstraction (composite_item_exporter, multi_item_exporter) makes it easy to fan out to multiple sinks simultaneously — CSV + Pub/Sub in one run, no custom glue code. 3. Streaming mode with last-synced-block checkpointing means you can run it as a daemon and survive restarts without re-scanning. 4. Test fixtures use recorded Web3 RPC responses, so the test suite runs without a node — good for CI.

1. Still uses Travis CI (badge in README, .travis.yml in tree) alongside GitHub Actions workflows — the CI setup is split and the Travis badge is probably broken since travis-ci.org shut down free tiers. 2. The Rust rewrite is advertised as only ~1.4x faster, which undersells the real bottleneck: you're still rate-limited by your RPC provider, not the Python parsing. For serious historical backfills you need a local archive node regardless of language. 3. No built-in handling for reorgs — if the chain reorganizes after you've exported a range, your CSVs silently have stale data. There's no re-export or invalidation mechanism. 4. Internal traces require a Parity/OpenEthereum node for `export_traces` or a Geth archive node with `debug_traceBlockByNumber` for the Geth variant — this is barely documented and Parity has been dead since 2022, so new users will hit a wall figuring out which trace method their node supports.

View on GitHub → Homepage ↗

// want more like this?

We dig through GitHub every week and send a few repos picked for what you actually care about — each with an honest take like this one.

Get finds in your inbox → Search again →