finds.dev← search

// the find

Arize-ai/phoenix

★ 10,110 · Python · NOASSERTION · updated Jun 2026

AI Observability & Evaluation

Phoenix is Arize AI's open-source LLM observability platform — traces, evals, prompt management, datasets, and experiments in one self-hostable package. It targets teams actively shipping LLM apps who need to see what's happening inside their pipelines and measure whether changes actually help. The breadth of framework integrations (LangChain, LlamaIndex, CrewAI, OpenAI Agents, Claude Agent SDK, etc.) means you can instrument most existing stacks with a few lines.

1. OpenTelemetry foundation is the right call — traces are standard OTLP so you're not locked into Phoenix's query layer forever. 2. The separation between arize-phoenix (full platform), arize-phoenix-evals (just the eval library), and arize-phoenix-otel (just the instrumentation) is genuinely useful; you can adopt incrementally without running a server. 3. Self-hostable via Docker or Kubernetes with a cloud option at app.phoenix.arize.com, so the same tooling works locally in a notebook and in production. 4. LLM-as-judge evals (RAG relevance, answer relevance) are built in and composable — you're not writing everything from scratch.

1. Licensed under Elastic License 2.0, not Apache or MIT — you cannot use this as a component in a competing SaaS product, which matters if you're building a platform on top of it. 2. The 'PXI Built-in Agent' (an LLM agent inside your observability tool) is opt-in but still a notable attack surface — an observability platform that itself calls an LLM introduces a dependency chain worth thinking through carefully. 3. The repo is enormous (Python platform + TypeScript frontend + multiple JS packages + Java instrumentation + agent skills) — contributing or debugging something obscure means navigating a lot of surface area before finding the relevant code. 4. Evaluation quality depends on the judge model you configure; the pre-built evals are templates, not ground truth, and teams routinely ship with miscalibrated rubrics without realizing it.

View on GitHub → Homepage ↗

// want more like this?

We dig through GitHub every week and send a few repos picked for what you actually care about — each with an honest take like this one.

Get finds in your inbox → Search again →