finds.dev← search

// the find

VectifyAI/PageIndex

★ 32,928 · Python · MIT · updated Jun 2026

📑 PageIndex: Document Index for Vectorless, Reasoning-based RAG

PageIndex replaces vector similarity search with LLM-driven tree traversal over structured document indexes. You build a hierarchical table-of-contents from a PDF, then ask the LLM to reason its way through that tree to find relevant sections. Aimed at anyone doing RAG over long professional documents where semantic chunking fails — financial filings, legal docs, technical manuals.

The core insight is sound: for structured documents, traversing a semantic tree is more precise than cosine similarity over arbitrary chunks, and the FinanceBench 98.7% number is a credible benchmark result, not a toy demo. The JSON tree output is clean and LLM-friendly — node summaries with page ranges mean the LLM can prune branches without reading full content. Multi-LLM support via LiteLLM is a practical choice that avoids vendor lock-in. The agentic example using OpenAI Agents SDK shows the pattern actually works end-to-end, not just in theory.

The open-source package is explicitly second-class — complex PDFs require their paid cloud OCR pipeline, which means the repo is essentially a demo of an idea with the real implementation behind a paywall. Index construction is LLM-heavy: building the tree for a large document makes many API calls, which is slow and expensive, and there's no caching or incremental update story for documents that change. The retrieval latency profile is unpredictable — tree depth determines how many sequential LLM calls you need, so a deep document with a narrow query could be slower than a vector search by an order of magnitude. No benchmarks outside the single FinanceBench result, so performance on non-financial document types is uncharacterized.

View on GitHub → Homepage ↗

// want more like this?

We dig through GitHub every week and send a few repos picked for what you actually care about — each with an honest take like this one.

Get finds in your inbox → Search again →