// the find
Agenta-AI/agenta
The open-source LLMOps platform: prompt playground, prompt management, LLM evaluation, and LLM observability all in one place.
Agenta is a self-hostable LLMOps platform covering the full prompt development lifecycle: playground for comparing prompts across 50+ models, version-controlled prompt management, systematic evaluation with LLM-as-judge and human annotation, and OpenTelemetry-native tracing. It targets teams shipping production LLM apps who need more than a notebook but less than a full MLOps stack.
OpenTelemetry-native observability is the right call — it means traces flow into whatever backend you already use rather than a proprietary sink. Prompt versioning with environments (staging/production promotion) solves a real pain point that most teams hack around with git comments. The evaluation framework is genuinely flexible: you can run LLM-as-judge, code-based evaluators, or human annotation through the same pipeline, not just one mode bolted on. Self-hosting via a single docker compose command with Traefik included is actually usable, not the usual 'left as an exercise to the reader' setup.
The repo is primarily TypeScript on the frontend but Python (FastAPI) on the backend, plus a separate Python SDK — three surfaces to keep in sync, and the migration history shows the data model has been reshaped repeatedly, which means upgrades between self-hosted versions are risky. The 'bring your own LLM app' model requires wrapping your app with their SDK, which is an adoption tax that breaks if Agenta's abstractions don't fit your architecture. Anonymized telemetry is opt-out, not opt-in — a problem for teams with data governance requirements who might miss the env flag. There's no clear story for scaling the tracing/observability store beyond a single Postgres instance; at real production trace volumes that becomes a bottleneck fast.