finds.dev← search

// the find

MigoXLab/dingo

★ 717 · Python · Apache-2.0 · updated Jun 2026

Dingo: A Comprehensive AI Data, Model and Application Quality Evaluation Tool

Dingo is a Python library for evaluating AI training data quality — it runs heuristic rules, LLM prompts, and agent workflows against datasets from local files, HuggingFace, SQL, or S3. Aimed at ML teams who want to validate pre-training corpora or fine-tuning datasets before burning compute on bad data. Active Chinese-origin project with English docs that feel like translations.

The hybrid rule/LLM strategy is the right architecture: run cheap deterministic rules on 100% of data, sample with LLM where depth matters. The framework supports this cleanly with configurable concurrency and separate result streams for each. SQL streaming via SQLAlchemy server-side cursors means you can point it at a production database without exporting to JSONL first — that detail matters for teams with data already in Postgres. The plugin registration system (`@Model.rule_register`, `@Model.llm_register`) is clean enough to actually use without touching core code. Spark executor integration for distributed processing is a real differentiator vs. single-machine tools when you're dealing with billion-row pre-training datasets.

The `LLMTextQualityV2` through `LLMTextQualityV5` naming in the codebase signals interface churn with no visible deprecation policy — you'd adopt V4 and discover V5 silently changed output shapes. The interactive web UI is locked behind a SaaS version that requires submitting a form and waiting 1–5 business days for approval; an evaluation tool with no way to browse results visually is a real gap in the OSS tier. The '100+ metrics' count is a feature list, not a quality guarantee — there's no published benchmark showing any of the LLM-based metrics agree with human judgment at meaningful rates. The scope has sprawled into resume optimization, document parsing, audio quality, and VLM rendering, which makes it hard to trust any one domain to be well-maintained.

View on GitHub → Homepage ↗

// want more like this?

We dig through GitHub every week and send a few repos picked for what you actually care about — each with an honest take like this one.

Get finds in your inbox → Search again →