// the find
onyx-dot-app/onyx
Open Source AI Platform - AI Chat with advanced features that works with every LLM
Onyx is a self-hosted AI chat platform that layers RAG, web search, code execution, and agent orchestration on top of any LLM — think a self-hosted Perplexity with enterprise access controls and 50+ data source connectors. It targets teams that want the capabilities of a GPT-4o wrapper but can't or won't send internal data to OpenAI. The feature surface is wide: SSO, RBAC, SCIM, whitelabeling, audit logs — this is clearly aimed at replacing internal knowledge-base tools.
The hybrid search approach (vector + keyword) is the right call — pure semantic search misfires on acronyms, product names, and exact-match queries, and most RAG projects skip this. The two-tier deployment (Lite vs Standard) is a thoughtful answer to the 'I just want a chat UI' use case without fragmenting the codebase. The Alembic migration history is deep and well-maintained — 200+ migrations without obvious gaps means they actually run these in production rather than nuking and recreating. The CI pipeline is genuinely thorough: separate workflows for Playwright, Jest, Python unit/integration/connector/model tests, Helm chart testing, and nightly LLM provider checks — rare to see this level of test coverage on an open-source platform.
The stack is heavy: Postgres, Redis, MinIO, a model server, background workers, and Vespa (used as the primary vector/keyword index in Standard mode) — you're looking at 8+ containers before you write a line of config. For a team that just wants internal document Q&A, the operational burden is steep and the failure surface is large. The enterprise/community edition split on an MIT-licensed repo is a classic bait-and-switch: core features are open, but anything an actual enterprise cares about (SSO, SCIM, RBAC, analytics) is behind the EE paywall, so the license headline is misleading. The multi-tenant migration path (`run_multitenant_migrations.py`) is a separate script that's easy to forget or run out of order — a footgun for anyone operating multiple tenants. Deep Research topping their own benchmark (`onyx_deep_research_bench`) is not a credible signal of quality.