Show HN: RAG, No Vectors

github.com

11 points by vectify_AI 7 days ago

We built PageIndex, a document indexing system that turns documents into hierarchical search trees to support reasoning-based RAG.

Traditional vector-based RAG often struggles with retrieval accuracy because it optimizes for similarity, not relevance. But what we really need in retrieval is relevance — which requires reasoning. When working with professional documents that demand domain expertise and multi-step reasoning, vector-based RAG and similarity search often fall short.

So we started exploring a more reasoning-driven approach to RAG. Reasoning-based RAG enables LLMs to think and reason their way to the most relevant document sections. Inspired by AlphaGo, we use tree search to perform structured document retrieval.

We open-sourced one of the key components: PageIndex. PageIndex is a hierarchical indexing system that builds search tree structures from long documents (like financial reports, regulatory documents, or textbooks), making them ready for reasoning-based RAG.

Some highlights:

- Hierarchical Structure: Organizes lengthy PDFs into LLM-friendly trees — like a smart table of contents.

- Precise Referencing: Each node includes a summary and exact physical page numbers.

- Natural Segmentation: Nodes align with document sections, preserving context — no arbitrary chunking.

We've used PageIndex for financial document analysis with reasoning-based RAG and saw significant improvements in retrieval accuracy compared to vector-based systems.

Would love any feedback — especially thoughts on reasoning-based RAG, or ideas for where PageIndex could be applied!

gbertb 7 days ago

Interesting! will check it out