// the find

stanford-oval/storm

★ 28,358 · Python · MIT · updated Sep 2025

An LLM-powered knowledge curation system that researches a topic and generates a full-length report with citations.

STORM is a Stanford research system that generates Wikipedia-style long-form articles by running multi-perspective simulated conversations before writing — the idea being that good questions precede good writing. Co-STORM extends this with a human-in-the-loop mode where you can steer the research discourse in real time. It's for researchers and builders who need structured, cited reports on a topic, not quick summaries.

The perspective-guided question generation is the genuinely clever part — instead of asking an LLM 'what do you know about X', it simulates expert personas asking each other questions, which surfaces coverage gaps that a single-pass prompt misses. The modular architecture (separate LM configs per pipeline stage) lets you use a cheap model for conversation simulation and a capable one for final writing, which keeps costs manageable. LiteLLM integration means you're not locked to OpenAI — Claude, Gemini, local Ollama all work. The FreshWiki and WildSeek datasets are a real contribution if you want to evaluate or fine-tune anything in this space.

The output quality is ceiling-bounded by the search results you feed it — if your retriever returns shallow or SEO-garbage pages, the article will cite that garbage confidently. There's no built-in deduplication or quality filter on retrieved sources. The multi-LM configuration is powerful but the setup friction is high: you're configuring 5-6 separate model slots before running anything, which is a lot of boilerplate for a tool most people will use once or twice. Co-STORM's interactive loop is interesting in the demo but the CLI interface (step-by-step in a Python REPL) is awkward for actual use — there's no persistent session or proper UI outside of the Streamlit demo, which is explicitly labeled as not production-ready.

View on GitHub → Homepage ↗