// the find
stanford-oval/storm
An LLM-powered knowledge curation system that researches a topic and generates a full-length report with citations.
STORM is a Stanford research system that generates Wikipedia-style long-form articles by running multi-perspective simulated conversations before writing — the idea being that good questions precede good writing. Co-STORM extends this with a human-in-the-loop mode where you can steer the research discourse in real time. It's for researchers and builders who need structured, cited reports on a topic, not quick summaries.
The perspective-guided question generation is the genuinely clever part — instead of asking an LLM 'what do you know about X', it simulates expert personas asking each other questions, which surfaces coverage gaps that a single-pass prompt misses. The modular architecture (separate LM configs per pipeline stage) lets you use a cheap model for conversation simulation and a capable one for final writing, which keeps costs manageable. LiteLLM integration means you're not locked to OpenAI — Claude, Gemini, local Ollama all work. The FreshWiki and WildSeek datasets are a real contribution if you want to evaluate or fine-tune anything in this space.
The output quality is ceiling-bounded by the search results you feed it — if your retriever returns shallow or SEO-garbage pages, the article will cite that garbage confidently. There's no built-in deduplication or quality filter on retrieved sources. The multi-LM configuration is powerful but the setup friction is high: you're configuring 5-6 separate model slots before running anything, which is a lot of boilerplate for a tool most people will use once or twice. Co-STORM's interactive loop is interesting in the demo but the CLI interface (step-by-step in a Python REPL) is awkward for actual use — there's no persistent session or proper UI outside of the Streamlit demo, which is explicitly labeled as not production-ready.