// the find
run-llama/llama_index
LlamaIndex is the leading document agent and OCR platform
LlamaIndex is a Python framework for building LLM-powered applications over your own data — RAG pipelines, agentic workflows, document parsing. It sits between raw LLM APIs and application code, handling ingestion, indexing, retrieval, and agent orchestration. The primary audience is Python developers building production RAG or agent systems who don't want to wire up every component from scratch.
300+ integration packages on LlamaHub mean you can swap LLMs, vector stores, and embeddings without rewriting your pipeline. The core/integration split is clean — llama-index-core stays lean and you only pull what you actually use. The Workflow abstraction for multi-step agentic pipelines is better than the chain-of-callbacks pattern LangChain still leans on. Build provenance attestation for static assets is a nice supply-chain touch that most framework repos skip entirely.
The monorepo has exploded past the point of easy navigation — 300 integration packages means frequent version skew hell when one integration lags a core API change. The README itself admits it's not kept current, which is a red flag for a project that beginners rely on for orientation. The cloud platform (LlamaParse) is increasingly the real product, and it's unclear which features are OSS-first vs. designed to funnel you toward paid tiers. Abstractions like StorageContext and ServiceContext have been deprecated and renamed multiple times, leaving a trail of outdated tutorials that will waste your afternoon.