// the find

agentset-ai/agentset

★ 2,020 · TypeScript · MIT · updated Apr 2026

The open-source RAG platform: built-in citations, deep research, 22+ file formats, partitions, MCP server, and more.

Agentset is a self-hostable RAG platform that handles the full pipeline: document ingestion, chunking, embedding, vector indexing, retrieval, and chat. It targets teams who want to ship a document Q&A product without stitching together separate services for each layer. The hosted version is aimed at non-engineers; the self-hosted path is for developers who want control over the stack.

The multi-tenancy support is baked in from the start rather than bolted on — namespaces, per-namespace API keys, and hosted chat deployments with custom domains are all first-class. The Trigger.dev integration for ingestion jobs means long-running document processing doesn't block the web process and failures are retryable with real observability. Supporting 22+ file formats including YouTube transcripts and web crawling via a single ingest API is genuinely useful and not easy to build yourself. The project ships typed SDKs and an OpenAPI spec, which is the right call for something positioned as infrastructure.

The self-hosting story requires Supabase, Trigger.dev, and Stripe all wired up before you can run it locally — the quick-start glosses over this, and the .env.example will have a lot of blanks that aren't optional. There's no vector database abstraction visible in the tree; it appears tightly coupled to Supabase's pgvector, so if you need Qdrant or Weaviate for scale you're forking. The repo shows commit activity dropped sharply after the initial push — last push April 2026 but the activity badge may be misleading. The 'deep research' and 'partitions' features mentioned in the description don't have obvious implementations visible in the directory tree, suggesting they may be cloud-only or still in progress.

View on GitHub → Homepage ↗