// the find
Marker-Inc-Korea/AutoRAG
AutoRAG: An Open-Source Framework for Retrieval-Augmented Generation (RAG) Evaluation & Optimization with AutoML-Style Automation
AutoRAG is a hyperparameter search tool for RAG pipelines — you define a YAML config listing candidate modules (BM25, vector retrieval, rerankers, etc.), provide a QA dataset, and it runs all combinations to find the best-performing pipeline for your data. It's for teams building production RAG systems who want data-driven module selection instead of gut-feel choices.
The node-graph YAML config is well-designed — you can express complex multi-stage pipelines (query expansion → retrieval → rerank → compress → generate) without writing glue code. The evaluation suite is genuinely broad: retrieval metrics (NDCG, MRR, F1, recall) and generation metrics (METEOR, ROUGE, semantic similarity) are tracked per-node, not just end-to-end. The data creation pipeline (parse → chunk → QA generation) is included rather than punted to the user, which matters because evaluation quality lives or dies on QA dataset quality. Deployment path from trial folder to API server is a straight line — no export/reimport ceremony.
The whole framework is tightly coupled to parquet files and a local directory structure — this works fine for a laptop experiment but falls apart when you want to run optimization as part of a CI pipeline or against data in object storage. Optimization is exhaustive combinatorial search with no intelligent pruning (no Bayesian optimization, no early stopping on clearly losing arms), so on a large config the runtime and LLM API costs grow fast and there's no built-in budget cap. The QA dataset generation uses LLM-generated questions as ground truth, which means the benchmark measures how well your RAG answers LLM-written questions — a meaningful but narrower signal than how well it handles real user queries. Documentation lives partly in Notion (module lists, hardware specs) rather than the repo, so links rot and offline access breaks.