// the find
AgentOps-AI/agentops
Python SDK for AI agent monitoring, LLM cost tracking, benchmarking, and more. Integrates with most LLMs and agent frameworks including CrewAI, Agno, OpenAI Agents SDK, Langchain, Autogen, AG2, and CamelAI
AgentOps is an observability SDK for Python AI agents — it wraps LLM provider clients and agent frameworks to record spans, track token costs, and replay sessions. It targets developers running multi-agent systems in production who want visibility into what their agents actually did and what it cost. The backend dashboard (ClickHouse + Supabase + FastAPI) is open source and self-hostable.
The move to OpenTelemetry as the instrumentation layer is the right call — spans, exporters, and processors map cleanly to OTel primitives, which means the data is portable and the mental model transfers. The decorator API (@session, @agent, @operation) handles async, generators, and exception recording without requiring you to restructure your code. Framework-level instrumentation for CrewAI, AG2, LangGraph, and OpenAI Agents SDK means you get call graphs for multi-agent handoffs without manually wrapping every LLM call. Self-hosting is genuinely possible — the app/ directory is a real FastAPI + ClickHouse backend, not a stub.
The versioning story is a mess: v1, v2, v3, v4 API routes coexist in the backend, legacy/ code sits alongside the new OTel-based SDK, and the README mixes v1 and v2 doc links without explaining what changed. The app/.todo/ directory has 10 open TODO files covering authentication middleware, trace endpoints, log endpoints, and testing — the self-hosted dashboard is not production-ready. The eval and benchmarking features that justify 'benchmarking' in the description are almost entirely 🔜 on the roadmap. Monkey-patching provider SDKs (anthropic, openai) is inherently fragile — any internal refactor in those SDKs can silently break instrumentation, and there's no clear contract about which SDK versions are tested.