// the find

potsawee/selfcheckgpt

★ 620 · Python · MIT · updated Jun 2024

SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models

SelfCheckGPT detects hallucinations in LLM outputs without access to model internals or ground-truth data — it generates multiple samples from the same prompt and checks whether a given statement is consistent across them. EMNLP 2023 paper with a clean pip-installable package. Useful for anyone building pipelines on top of black-box LLMs (GPT-4, Claude via API) who needs a hallucination signal without fine-tuning.

Five distinct scoring backends (BERTScore, NLI, n-gram, QA, LLM-prompt) with well-documented tradeoffs between cost and accuracy — NLI hits 92.5 AUC-PR on their benchmark at reasonable compute cost. The zero-resource framing is genuinely useful: no labeled data needed, works on any generative model. SelfCheck-NLI using DeBERTa-v3-large is self-contained and doesn't require calling another LLM. Benchmark table is honest — they show their own method losing to probability baselines in some configurations.

The core assumption — that consistent hallucinations are rare — breaks down badly for systematic errors like date arithmetic or domain-specific facts that the model confidently and consistently gets wrong. Calling the LLM 3-5 times per evaluation to get samples multiplies inference cost by that factor, which is fine for offline auditing but rules it out for latency-sensitive paths. The dataset is 238 passages of GPT-3 WikiBio outputs — narrow benchmark that may not transfer to code, math, or structured outputs. Last commit June 2024 and only OpenAI/Groq in the API prompt variant; no Anthropic or Gemini support out of the box.

View on GitHub →