// the find

Ar9av/PaperOrchestra

★ 603 · Python · NOASSERTION · updated Jun 2026

An automated AI research-paper writer based off Google's PaperOrchestra paper's implementation through a skills - benchmark + autoraters using any coding agent (Claude Code, Cursor, Antigravity, Cline, Aider). No API keys, no LLM SDKs.

PaperOrchestra is a skill pack that lets coding agents (Claude Code, Cursor, etc.) run a five-stage multi-agent pipeline — outline, plotting, lit review, writing, refinement — to produce LaTeX research papers from experiment logs and idea documents. It's a faithful implementation of a Google Research paper that showed 50–68% win margins over single-agent baselines on literature review quality. The target user is an ML researcher who has run experiments through an AI coding agent and wants to turn scattered notes into a conference submission.

The no-API-keys architecture is genuinely thoughtful: all LLM reasoning is delegated to whatever host agent you're already using, so the skill pack itself is just structured instruction documents plus deterministic Python helpers for validation, dedup, and BibTeX formatting. The paper-fidelity commitment is unusually disciplined — every agent prompt is verbatim from Appendix F with a page-number citation, and out-of-paper additions are explicitly flagged. The agent-research-aggregator is the most practically useful piece: it can reconstruct a coherent experiment log from scattered `.claude/`, `.cursor/`, and `.antigravity/` caches, which covers the actual workflow of someone who ran a bunch of experiments but never wrote things up. The deterministic hardening scripts — orphan-citation gate, anti-leakage grep, worklog-based rollback — are the kind of boring infrastructure that actually matters when you're trusting an LLM to write your references section.

The skill pack has no real error recovery: if one stage produces bad JSON or a miscounted citation pool, the next agent gets garbage and the pipeline silently degrades. The plotting-agent relies on PaperBanana as its diagram backbone, which requires a separate clone, separate install, and either Gemini or OpenRouter keys — the 'no API keys' claim in the description is only true if your host agent has native web search and you skip figures. There's no explicit handling for the case where the same paper appears in both Semantic Scholar and CrossRef with different metadata, which will cause citation dedup to miss duplicates in practice. The skills are instruction documents, not code, so debugging a bad run means reading your agent's transcript and reasoning about what it probably did — there's no pipeline state you can inspect or rerun from a checkpoint.

View on GitHub → Homepage ↗