// the find

567-labs/instructor

★ 13,159 · Python · MIT · updated Jun 2026

structured outputs for llms

Instructor wraps any LLM provider's API to return validated Pydantic models instead of raw text or JSON strings. You define the shape you want, it handles schema generation, validation, and retry-on-failure. Aimed at developers extracting structured data from unstructured text who don't want to write JSON schema by hand or debug malformed responses.

Retry logic feeds validation errors back into the prompt automatically — the LLM sees what it got wrong and tries again, which actually works better than silent retries. The `from_provider()` unified interface means switching from OpenAI to Anthropic to a local Ollama model is a one-line change. Streaming with `Partial[Model]` gives you progressive object population as tokens arrive, which is genuinely useful for UX. Provider coverage is real — Bedrock, Vertex, Groq, Azure, Mistral, Cohere all have documented integration pages, not just listed logos.

The README claims '1000+ community contributors' and 'teams at OpenAI, Google, Microsoft' without any sourcing — social proof inflation that makes it harder to calibrate actual adoption. No built-in cost tracking or token accounting: you get structured output but you're on your own for knowing what it cost, which matters at production scale. Provider parity is uneven — tool-calling mode, JSON mode, and response format differ per provider, and the abstraction leaks when you hit edge cases. The `Partial` streaming type requires the model to output fields in schema order to work well; models that don't cooperate produce confusing partial states.

View on GitHub → Homepage ↗