// the find
GreyDGL/PentestGPT
Automated Penetration Testing Agentic Framework Powered by Large Language Models
PentestGPT is an autonomous penetration testing agent that wraps Claude Code CLI to work through CTF challenges and HackTheBox-style targets. The v1 agentic mode runs an iteration loop against a target IP, maintaining session context across runs. The legacy interactive mode (from the USENIX 2024 paper) uses three cooperating LLM sessions — reasoning, generation, parsing — with a human driving commands.
The Pentesting Task Tree from the academic paper is a genuinely interesting architecture — splitting reasoning, generation, and parsing across separate LLM sessions reduces context pollution and lets each session stay focused. The multi-provider support in legacy mode is cleanly done via a registry pattern in a single file, making it easy to add a new provider without touching call sites. Benchmarks are specific and credible: 86.5% on XBOW with per-difficulty breakdowns and actual cost/time figures, not vague claims. The session persistence and context-file approach for the agentic loop is practical — it means you don't lose progress when hitting token limits.
The agentic mode is Claude Code CLI only, which means it's a wrapper around a CLI tool rather than a proper programmatic agent — you're at the mercy of Claude Code's own context management and you can't easily instrument or extend the execution graph. The split between 'agentic' and 'legacy' modes is confusing and suggests the codebase is mid-refactor rather than having a coherent architecture; the README itself describes legacy as 'the real paper implementation.' Telemetry is opt-out by default, and while it claims no sensitive data is collected, running this against actual targets means command patterns reveal a lot about what you're doing and where. The benchmarks are self-reported against XBOW, with no independent replication noted.