// the find

viddexa/autollm

★ 1,004 · Python · AGPL-3.0 · updated Jan 2024

Ship RAG based LLM web apps in seconds.

AutoLLM is a thin wrapper around LlamaIndex and LiteLLM that tries to reduce RAG boilerplate to single-line calls. It targets developers who want a FastAPI endpoint over a document collection without wiring up LlamaIndex's storage context, service context, and vector store manually. The pitch is 'one line to query engine, one line to API.'

The LanceDB default is a genuinely good call — serverless, zero-infra, and avoids spinning up a separate vector DB for small projects. The cost calculation callback is useful and something LlamaIndex doesn't surface well on its own. The unified LLM string syntax (inherited from LiteLLM) means swapping between OpenAI, Bedrock, and Vertex is a config change, not a code change. The config-file approach for FastAPI apps is practical for teams that want to version-control their RAG setup.

Last commit was January 2024 — LlamaIndex has had multiple breaking API changes since then (the `llama_index` → `llama_index.core` split alone breaks the migration example in the README). The abstraction is shallow enough that you'll hit LlamaIndex internals the moment you need custom retrievers, rerankers, or streaming, at which point the wrapper is just friction. AGPL 3.0 is a landmine for anyone building a commercial SaaS product — the FAQ acknowledges it but doesn't explain the implications. There are no async query engine variants despite FastAPI being async-native, so under any real load you're blocking an async event loop on sync LlamaIndex calls.

View on GitHub →