Show HN: RAG, No Vectors
github.comWe built PageIndex, a document indexing system that turns documents into hierarchical search trees to support reasoning-based RAG.
Traditional vector-based RAG often struggles with retrieval accuracy because it optimizes for similarity, not relevance. But what we really need in retrieval is relevance — which requires reasoning. When working with professional documents that demand domain expertise and multi-step reasoning, vector-based RAG and similarity search often fall short.
So we started exploring a more reasoning-driven approach to RAG. Reasoning-based RAG enables LLMs to think and reason their way to the most relevant document sections. Inspired by AlphaGo, we use tree search to perform structured document retrieval.
We open-sourced one of the key components: PageIndex. PageIndex is a hierarchical indexing system that builds search tree structures from long documents (like financial reports, regulatory documents, or textbooks), making them ready for reasoning-based RAG.
Some highlights:
- Hierarchical Structure: Organizes lengthy PDFs into LLM-friendly trees — like a smart table of contents.
- Precise Referencing: Each node includes a summary and exact physical page numbers.
- Natural Segmentation: Nodes align with document sections, preserving context — no arbitrary chunking.
We've used PageIndex for financial document analysis with reasoning-based RAG and saw significant improvements in retrieval accuracy compared to vector-based systems.
Would love any feedback — especially thoughts on reasoning-based RAG, or ideas for where PageIndex could be applied!
Interesting! will check it out