// the find

danny-avila/rag_api

★ 838 · Python · MIT · updated Apr 2026

ID-based RAG FastAPI: Integration with Langchain and PostgreSQL/pgvector

A FastAPI service that wraps Langchain's pgvector integration to provide file-scoped RAG via a REST API. Built primarily as the RAG backend for LibreChat, but usable independently for any system that needs per-file embedding storage and retrieval. Supports multiple embedding providers (OpenAI, Azure, HuggingFace, Ollama, Bedrock, VertexAI) and can swap pgvector for Atlas MongoDB.

- File-scoped embeddings via file_id is a practical design choice that avoids the common mistake of mixing all documents into one namespace, making targeted retrieval and cleanup straightforward.

- Good provider coverage - supporting OpenAI, Azure, HuggingFace (sentence-transformers), HF TEI, Ollama, Bedrock, VertexAI, and Google GenAI from a single env var swap is genuinely useful.

- The batch embedding pipeline with bounded memory queuing (EMBEDDING_BATCH_SIZE + EMBEDDING_MAX_QUEUE_SIZE) is a real operational concern handled properly, not an afterthought.

- Test coverage is reasonable: unit, integration, memory-tracking with tracemalloc, and middleware tests are all present, which is above average for a microservice in this space.

- Hard coupling to LibreChat's JWT format for auth - the JWT verification is described as 'basic' and assumes a pre-signed token from elsewhere, meaning you'd need to rework auth if integrating outside LibreChat.

- Only one collection per deployment (COLLECTION_NAME env var) with no per-request collection routing, so multi-tenant isolation beyond file_id scoping isn't supported without running separate instances.

- The Atlas MongoDB backend is a second-class citizen - RAG_DISTANCE_THRESHOLD is silently ignored, the index setup is entirely manual, and the README warns about inverted similarity semantics, suggesting this path hasn't been exercised as heavily as pgvector.

- No rate limiting or upload size controls in the API layer itself - large file uploads or embedding cost runaway are left entirely to the operator to handle at the infrastructure level.

View on GitHub → Homepage ↗