// the find

Agent-RL/ReCall

★ 1,398 · Python · MIT · updated May 2025

ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning & ReCall: Learning to Reason with Tool Call for LLMs via Reinforcement Learning

ReCall trains LLMs to use arbitrary tools (not just search) through reinforcement learning, without needing supervised examples of tool-use trajectories. It extends the earlier ReSearch work beyond web search to any Python-callable tool. The target audience is ML researchers working on agent training, not practitioners looking to deploy a tool-using agent today.

The core RL-without-demonstrations approach is genuinely interesting — the model learns when and how to call tools purely from outcome rewards, which sidesteps the expensive human-annotation bottleneck for tool-use data. The sandbox architecture for safely executing arbitrary Python tool code is the right call and shows the authors thought about the obvious security problem. Pre-trained models are available on HuggingFace, so you can reproduce results without running the full training pipeline. The decision to build on verl (a serious distributed RL framework) rather than rolling their own training loop means the training infrastructure isn't the bottleneck.

The sandbox they ship is explicitly described as 'basic' and they warn against local hosting due to security risks — that's a significant gap for anyone who wants to train with custom tools in a controlled environment. The BFCL evaluation code is listed as 'coming soon', which means the headline claim about general tool-use is only partially verified in the repo. The whole stack requires coordinating five separate services (model server, sandbox, retriever, Ray cluster, training script), and the multi-node setup docs are thin — getting this running end-to-end on new hardware will burn a day. The repo was renamed from ReSearch to ReCall in April 2025 and hasn't been touched since May, suggesting active development may have stalled.

View on GitHub → Homepage ↗