finds.dev← search

// the find

HariSekhon/DevOps-Python-tools

★ 822 · Python · MIT · updated Feb 2026

80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Functions, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.

A large collection of standalone Python CLI scripts for DevOps and data engineering tasks — AWS IAM auditing, Hadoop/HBase/Hive/Impala operations, Spark format converters, Docker registry queries, and log anonymization. It's a personal toolbox by a Big Data contractor, not a library or framework. If you work with the Hadoop ecosystem specifically, several of these scripts would have saved you hours.

The anonymize.py tool is genuinely useful — it handles a surprising range of sensitive patterns (AWS keys, Kerberos principals, LDAP fields, Cisco configs) and the parallel variant can get a 30x speedup on large log files. The find_active_server.py pattern is clever: one multi-threaded script that detects the live node in any HA cluster, with pre-baked wrappers for each technology so you don't have to remember flags. The Spark format converters (Avro/Parquet/CSV/JSON in every combination) are straightforward and actually tested with real data files. CI coverage is unusually thorough — multiple Python versions, multiple Linux distros, Alpine, Mac, and per-tool Docker builds all run on every push.

The whole repo is essentially a flat directory of 100+ scripts with a shared Makefile — there's no installable package, no pyproject.toml, no entry points, so distribution is 'clone the repo and run make'. Most of the Hadoop tooling targets HDP 2.x and Ambari, which have been EOL since 2021; if you're on CDH/CDP or a modern managed service, half the scripts won't apply. The dependency footprint is massive — requirements.txt pulls in PySpark, HBase thrift bindings, Jython wrappers, and cloud SDKs even if you only want the anonymizer. Last push was February 2026 but the Travis CI integration and several Jython scripts still reference patterns from 2017-era Hadoop.

View on GitHub → Homepage ↗

// want more like this?

We dig through GitHub every week and send a few repos picked for what you actually care about — each with an honest take like this one.

Get finds in your inbox → Search again →