// the find

hegelai/prompttools

★ 3,038 · Python · Apache-2.0 · updated Feb 2026

Open-source tools for prompt testing and experimentation, with support for both LLMs (e.g. OpenAI, LLaMA) and vector databases (e.g. Chroma, Weaviate, LanceDB).

PromptTools is a Python library for running grid-search-style experiments across LLMs and vector databases — vary your prompts, models, and parameters, then visualize results in a notebook or Streamlit playground. It's aimed at developers who want to compare GPT-4 vs. LLaMA vs. Mistral on the same prompt set without writing the loop themselves. Think of it as pytest for prompt engineering.

The experiment abstraction is genuinely useful: you define parameter lists and it runs the cartesian product, which is exactly what you want when tuning temperature or comparing models. Vector DB support (Chroma, Qdrant, Weaviate, Pinecone, LanceDB) is broad and includes retrieval accuracy evaluation, not just generation. The `prompttest` module lets you write assertion-style tests against LLM outputs and wire them into CI, which is the right way to catch regressions. Notebook-first design with Streamlit playground means the barrier to running your first experiment is genuinely low.

Last meaningful commit activity looks stale — Ollama and several other integrations are still marked 'In Progress' and have been for a while, which is a warning sign for a dependency you'd build workflows on. The auto-eval utilities use OpenAI to judge OpenAI outputs, a circularity that produces unreliable scores for anything other than obvious pass/fail checks. No built-in result persistence beyond manual CSV/JSON export, so if you want a history of experiment runs over time you're rolling your own. Sentry error tracking is opt-out rather than opt-in, which is a bad default for a library that handles API keys and prompt content.

View on GitHub → Homepage ↗