finds.dev← search

// the find

scrapfly/scrapfly-scrapers

★ 1,003 · Python · NOASSERTION · updated Jun 2026

Scalable Python web scraping scripts for +40 popular domains

A collection of 40+ Python web scrapers for popular sites (Amazon, LinkedIn, Zillow, TikTok, etc.), all built on top of ScrapFly's paid API. Each scraper is a self-contained module using async Python with parsel for HTML parsing and JMESPath for JSON extraction. The target audience is developers who want working scraper code they can study or adapt without starting from scratch.

Sample JSON output is committed alongside each scraper, so you can see exactly what data you'll get before writing a line of code. The async/await pattern is used consistently across all scrapers, not mixed with sync code. Each scraper is isolated in its own directory with its own pyproject.toml and test file, so you can pull one out without dragging the whole repo. Active maintenance is evident — the repo was pushed yesterday and many guides have 2025 update tags.

Everything requires a ScrapFly API subscription — this is essentially a marketing repo for a paid service, and the 'educational' framing in the disclaimer is doing a lot of legal heavy lifting. There are no rate limiting, retry logic, or error handling examples in the scrapers themselves; all of that is outsourced to ScrapFly's SDK, so you learn nothing about how to handle those problems on your own. The scraper-per-site structure means zero code sharing for common patterns (pagination, cookie handling, JSON extraction), leading to 40+ near-identical boilerplate files. Several high-value targets like LinkedIn and Instagram have ToS that explicitly prohibit scraping, and the repo doesn't surface that risk prominently.

View on GitHub → Homepage ↗

// want more like this?

We dig through GitHub every week and send a few repos picked for what you actually care about — each with an honest take like this one.

Get finds in your inbox → Search again →