// the find

refuel-ai/autolabel

★ 2,322 · Python · MIT · updated Mar 2025

Label, clean and enrich text datasets with LLMs.

Autolabel is a Python library for programmatic dataset labeling using LLMs — you define a task config (classification, NER, QA, entity matching), point it at a CSV, and it runs the labeling with cost estimation upfront. It's aimed at ML engineers who need labeled training data without paying for human annotation at scale.

The pre-flight cost estimate before committing to a run is genuinely useful and prevents surprise API bills. SQLite-backed caching means interrupted runs resume rather than restart from scratch, which matters when labeling millions of rows. Few-shot example selection via vector similarity is built in, not bolted on. The benchmark suite across 20+ datasets with reproducible configs is honest about where the approach works and where it doesn't.

Last commit was March 2025 and activity has slowed noticeably — the roadmap project board is stale, so don't assume open issues will get fixed. The confidence scores are logprob-derived and only meaningful for models that expose them; with Claude or Gemini you get a number that's less trustworthy than it looks. Streaming and async support are limited, so labeling large datasets blocks a single thread and throughput is lower than it should be. The Refuel-hosted LLM integration (the main differentiator over just calling OpenAI directly) requires requesting access through a Typeform, which is a dead end if the company behind it has gone quiet.

View on GitHub → Homepage ↗