// the find

microsoft/RD-Agent

★ 13,425 · Python · MIT · updated Jun 2026

Research and development (R&D) is crucial for the enhancement of industrial productivity, especially in the AI era, where the core aspects of R&D are mainly focused on data and models. We are committed to automating these high-value generic R&D processes through R&D-Agent, which lets AI drive data-driven AI. 🔗https://aka.ms/RD-Agent-Tech-Report

RD-Agent is Microsoft's framework for automating data science R&D loops — it proposes hypotheses, implements them as code, evaluates the results, and iterates. The primary use cases are quantitative finance (factor/model discovery via Qlib), Kaggle competition automation, and autonomous LLM fine-tuning. It's aimed at ML researchers and quant teams who want to run self-improving agent loops rather than writing one-off scripts.

The R/D split is architecturally sound — separating hypothesis proposal from implementation makes the loop composable and easier to debug than monolithic agents. The quant scenario (RD-Agent-Q) has published benchmark numbers showing 2× ARR improvement over baseline factor libraries at under $10 per run, which is a concrete and falsifiable claim. LiteLLM as the default backend is the right call — it means you're not locked to OpenAI and can swap in DeepSeek or local models without touching agent code. The CoSTEER evolving strategy (collaborative evolving with feedback-guided knowledge management) is well-documented in papers with reproducible traces, not just marketing claims.

Linux-only with a hard Docker dependency means Windows and macOS developers are immediately blocked — this isn't a minor caveat, it rules out a large share of the audience the README is trying to attract. The scenario surface area is enormous (quant, Kaggle, medical, LLM fine-tuning, paper reading) but the depth per scenario is uneven — the Kaggle agent is listed as 'demo coming soon' in several places while claiming to be a key feature. Cost control is unclear: the agent loops can run expensive LLM calls in tight iteration cycles with no obvious budget ceiling or early-stop mechanism documented in the README. The codebase mixes research prototype patterns (lots of YAML prompt files, heavy inheritance chains in core/) with production-framework ambitions, which makes it harder to extend without reading a lot of internal conventions first.

View on GitHub → Homepage ↗