// the find

NovaSearch-Team/RAG-Retrieval

★ 1,126 · Python · MIT · updated May 2026

Unify Efficient Fine-tuning of RAG Retrieval, including Embedding, ColBERT, ReRanker.

RAG-Retrieval is a training and inference toolkit for fine-tuning the retrieval components of RAG pipelines — embeddings, ColBERT late-interaction models, and rerankers. It covers the full stack from BERT-based to LLM-based rankers, with distillation support to shrink larger models down. Aimed at ML engineers who want domain-adapted retrievers rather than off-the-shelf ones.

The unified inference API for rerankers is genuinely useful — one interface regardless of whether you're running a cross-encoder or an LLM-based ranker, with sensible long-document handling (truncate vs. split-and-max). Distillation from LLM-scale rerankers down to BERT-base is well-supported and the Jasper/Stella work gives it academic backing. The MyopicTrap positional bias study is a real contribution — benchmarks specifically designed to expose position sensitivity across BM25, dense, ColBERT, and rerankers. Multi-GPU training via DeepSpeed ZeRO (stages 0–3) and FSDP means you can actually fine-tune LLM-based rerankers without a single A100.

The benchmark numbers are modest — rag-retrieval-reranker matches bge-reranker-base at 0.41 GB vs 1.11 GB, which is a good size story, but the absolute scores aren't compelling enough to justify switching if you already have bge working. The training data pipeline assumes you already have the right training set; there's no guidance on constructing hard negatives for your domain, which is usually the bottleneck. Python 3.8 as the target in the conda setup command is out of date — most serious ML work has moved past 3.10 at minimum. Test coverage is thin: a handful of inference tests for specific model combinations, nothing for the training code paths or distillation pipelines.

View on GitHub →