// the find

jordanpotti/CloudScraper

★ 536 · Python · MIT · updated Mar 2022

CloudScraper: Tool to enumerate targets in search of cloud resources. S3 Buckets, Azure Blobs, Digital Ocean Storage Space.

CloudScraper spiders a target website and regex-scrapes every page's source for cloud resource strings — S3 bucket URLs, Azure blob endpoints, DigitalOcean spaces. It's a recon tool for bug bounty hunters and pentesters doing cloud exposure discovery. One Python file, no frills.

Regex over full page source is actually smarter than parsing only href attributes — it catches bucket refs buried in JS or inline styles. Multiprocessing support with configurable process count is a practical touch for wide spidering. Dockerfile included so you don't have to deal with Python environment nonsense. Simple enough to read and modify in 10 minutes.

Last commit was 2022; cloud provider URL patterns have evolved and GCP/Cloudflare R2 aren't covered at all, making the tool increasingly incomplete for modern recon. No configurable keyword file despite it being listed as a TODO since the beginning — you have to edit the source directly. The regex-everything approach produces false positives and there's no deduplication or output formatting beyond terminal colors. No rate limiting or respect for robots.txt, which will get you blocked or flagged fast on anything with a WAF.

View on GitHub →