// the find
Giskard-AI/giskard-oss
🐢 Open-Source Evaluation & Testing library for LLM Agents
Giskard is a Python testing framework for LLM agents and pipelines, currently mid-rewrite from v2 to v3. The v3 architecture splits functionality into focused packages (giskard-checks, giskard-scan, giskard-rag) to cut dependency weight. Aimed at teams who need repeatable, automated evals beyond vibe-checking their agent outputs.
The modular package split in v3 is the right call — you can pull in just giskard-checks without dragging in red-teaming dependencies you don't need. The Scenario API handles non-deterministic outputs sensibly, treating evals as probability assertions rather than hard string matches. The vulnerability scanner covers OWASP LLM Top-10 categories with built-in adversarial generators (crescendo, GOAT, prompt injection datasets), which would take weeks to build yourself. Provider abstraction in giskard-llm supports Anthropic, OpenAI, Google, and Azure without forcing LiteLLM as a mandatory dependency.
v3 is still beta and materially incomplete — giskard-scan and giskard-rag are listed as 'in progress' and 'planned', meaning the most compelling features (RAG eval, full vuln scanner) don't actually exist yet in the new architecture. v2 is abandoned with no migration path, so anyone who built on the old scan/RAGET APIs is stranded. The async-first design is correct for production but the README buries the asyncio.run() requirement, which will frustrate anyone copy-pasting the quickstart into a Jupyter notebook. Telemetry is opt-out, not opt-in, which will hit a wall in any enterprise or regulated-data environment regardless of what it actually collects.