finds.dev← search

// the find

Giskard-AI/giskard-oss

★ 5,429 · Python · Apache-2.0 · updated Jun 2026

🐢 Open-Source Evaluation & Testing library for LLM Agents

Giskard is a Python testing framework for LLM agents and pipelines, currently mid-rewrite from v2 to v3. The v3 architecture splits functionality into focused packages (giskard-checks, giskard-scan, giskard-rag) to cut dependency weight. Aimed at teams who need repeatable, automated evals beyond vibe-checking their agent outputs.

The modular package split in v3 is the right call — you can pull in just giskard-checks without dragging in red-teaming dependencies you don't need. The Scenario API handles non-deterministic outputs sensibly, treating evals as probability assertions rather than hard string matches. The vulnerability scanner covers OWASP LLM Top-10 categories with built-in adversarial generators (crescendo, GOAT, prompt injection datasets), which would take weeks to build yourself. Provider abstraction in giskard-llm supports Anthropic, OpenAI, Google, and Azure without forcing LiteLLM as a mandatory dependency.

v3 is still beta and materially incomplete — giskard-scan and giskard-rag are listed as 'in progress' and 'planned', meaning the most compelling features (RAG eval, full vuln scanner) don't actually exist yet in the new architecture. v2 is abandoned with no migration path, so anyone who built on the old scan/RAGET APIs is stranded. The async-first design is correct for production but the README buries the asyncio.run() requirement, which will frustrate anyone copy-pasting the quickstart into a Jupyter notebook. Telemetry is opt-out, not opt-in, which will hit a wall in any enterprise or regulated-data environment regardless of what it actually collects.

View on GitHub → Homepage ↗

// want more like this?

We dig through GitHub every week and send a few repos picked for what you actually care about — each with an honest take like this one.

Get finds in your inbox → Search again →