// the find
mattpocock/evalite
Evaluate your LLM-powered apps with TypeScript
Evalite is a Vitest-based eval runner for TypeScript LLM apps. You write `.eval.ts` files with test cases and scorer functions, run them via CLI, and view score history in a local React UI backed by SQLite. It's for TypeScript shops who want eval infrastructure that feels like their existing test setup rather than a separate SaaS platform.
Plugs into Vitest rather than inventing its own test runner, so you get watch mode, filtering, and CI integration for free. The AI SDK traces integration captures token counts and intermediate LLM calls without manual instrumentation. Variant comparison lets you run the same eval suite against multiple prompt versions side by side, which is the core workflow when iterating on prompts. Local SQLite storage means results persist across runs and you can track score regressions over time without sending data anywhere.
The contributing docs include an `npm link` workaround to get the global `evalite` command working — that's a packaging rough edge you'll hit immediately in a new project. SQLite-local results don't aggregate across team members or CI runners without extra plumbing, so score history breaks down in any multi-contributor setup. Several features are marked `experimental_` in the fixture names, meaning the API surface is still moving; adopting those early means migration work. Scorer ecosystem is thin — you're writing most domain-specific scorers yourself, and the built-in ones are minimal compared to frameworks like Braintrust or PromptFoo.