// the find
Arize-ai/phoenix
AI Observability & Evaluation
Phoenix is Arize AI's open-source LLM observability platform — traces, evals, prompt management, datasets, and experiments in one self-hostable package. It targets teams actively shipping LLM apps who need to see what's happening inside their pipelines and measure whether changes actually help. The breadth of framework integrations (LangChain, LlamaIndex, CrewAI, OpenAI Agents, Claude Agent SDK, etc.) means you can instrument most existing stacks with a few lines.
1. OpenTelemetry foundation is the right call — traces are standard OTLP so you're not locked into Phoenix's query layer forever. 2. The separation between arize-phoenix (full platform), arize-phoenix-evals (just the eval library), and arize-phoenix-otel (just the instrumentation) is genuinely useful; you can adopt incrementally without running a server. 3. Self-hostable via Docker or Kubernetes with a cloud option at app.phoenix.arize.com, so the same tooling works locally in a notebook and in production. 4. LLM-as-judge evals (RAG relevance, answer relevance) are built in and composable — you're not writing everything from scratch.
1. Licensed under Elastic License 2.0, not Apache or MIT — you cannot use this as a component in a competing SaaS product, which matters if you're building a platform on top of it. 2. The 'PXI Built-in Agent' (an LLM agent inside your observability tool) is opt-in but still a notable attack surface — an observability platform that itself calls an LLM introduces a dependency chain worth thinking through carefully. 3. The repo is enormous (Python platform + TypeScript frontend + multiple JS packages + Java instrumentation + agent skills) — contributing or debugging something obscure means navigating a lot of surface area before finding the relevant code. 4. Evaluation quality depends on the judge model you configure; the pre-built evals are templates, not ground truth, and teams routinely ship with miscalibrated rubrics without realizing it.