// the find
wassim249/fastapi-langgraph-agent-production-ready-template
A production-ready FastAPI template for building AI agent applications with LangGraph integration. This template provides a robust foundation for building scalable, secure, and maintainable AI agent services.
A FastAPI + LangGraph starter template for building AI agent backends, targeting engineers who want auth, memory, observability, and deployment wiring already in place. It covers JWT sessions, mem0+pgvector long-term memory, Langfuse tracing, Prometheus/Grafana metrics, and a circular-fallback LLM service. Good starting point for someone building a production chatbot service who doesn't want to assemble all these pieces from scratch.
- The LLM service's two-layer resilience (tenacity exponential backoff + circular model rotation with a total timeout budget) is a concrete, well-thought-out pattern that most tutorials skip entirely.
- Observability stack is actually wired end-to-end: Langfuse for LLM traces, structured logging with per-request context injection via middleware, and Prometheus metrics with pre-built Grafana dashboard JSON — not just stubs.
- mem0+pgvector long-term memory is self-hosted in the existing Postgres instance with no separate cloud service dependency, which is the right default for a production template.
- Project structure is clean and navigable — schemas, services, models, and LangGraph graph code are properly separated, and the docs folder has actual content per component rather than a single README wall.
- OpenAI-only LLM support is a significant limitation that's acknowledged in the FAQ and punted to an open issue — any team using Anthropic or Gemini has to wire their own provider support before this template is actually useful.
- The eval framework in evals/ is LLM-as-judge using prompt files for metrics like hallucination and relevancy — these are notoriously unreliable without calibration data, and there's no baseline or ground-truth dataset included, so it reads more like scaffolding than a working eval pipeline.
- No WebSocket or SSE streaming support for agent responses — the chatbot endpoint appears synchronous, which is a real UX problem for any agent doing multi-step reasoning with tool calls since users will stare at a spinner for 10+ seconds.
- The circular fallback LLM logic requires pre-configuring multiple models in the registry, but the template ships with a single model default and no guidance on how many models to register or how to handle cost differences between fallback tiers.