// the find
deepset-ai/haystack
Open-source AI orchestration framework for building context-engineered, production-ready LLM applications. Design modular pipelines and agent workflows with explicit control over retrieval, routing, memory, and generation. Built for scalable agents, RAG, multimodal applications, semantic search, and conversational systems.
Haystack is a Python framework for building RAG pipelines and LLM agent workflows with an explicit, graph-based component model. It targets teams who want more control over retrieval and generation logic than LangChain offers, without writing everything from scratch. Used in production at Apple, Netflix, and others.
- The pipeline model is genuinely explicit: components declare typed inputs/outputs, connections are validated at build time, and the DAG can be serialized to YAML. This makes debugging and reproducibility much better than callback-chain approaches.
- Integration breadth is real and well-maintained—separate haystack-core-integrations repo covers OpenAI, Anthropic, Bedrock, Cohere, Ollama, Weaviate, Qdrant, Elasticsearch, pgvector, and many more, each as independent packages so you're not pulling in everything.
- Test infrastructure is solid: Mypy strict typing, Ruff linting, good coverage reporting, e2e tests in CI, and a nightly pre-release channel. The codebase is clearly production-grade, not a research prototype.
- The pipeline serialization story (YAML/JSON round-trips) plus Hayhooks for serving as REST or MCP endpoints means you can go from prototype to deployed API without rewriting the pipeline definition.
- The pipeline graph model, while explicit, becomes awkward for dynamic agent behaviors where the number of tool calls or loop iterations isn't known ahead of time. You end up fighting the static DAG when you need truly adaptive routing.
- Version 2 (current) is a substantial break from v1, and if you search for Haystack examples online a lot of results are v1 code that won't work. The migration guide exists but the ecosystem of tutorials and Stack Overflow answers is still catching up.
- Observability beyond basic tracing hooks is thin in the open-source version—anything serious pushes you toward their paid Enterprise Platform, which is a meaningful gap if you're self-hosting at scale.
- The `haystack-ai` package itself is fairly lean; most real-world integrations require additional packages that each have their own versioning. Dependency resolution headaches are common when mixing multiple document stores and embedders.