@keleshteri/pageindex-rag
v1.0.11
Published
Vectorless RAG using LLM-driven hierarchical document indexing — PDF and Markdown/Obsidian vault support
Downloads
1,716
Maintainers
Readme
pageindex-rag
Vectorless RAG using LLM-driven hierarchical document indexing — for PDF and Markdown files (including Obsidian vaults).
No embeddings. No vector database. The LLM reads the document structure once, builds a tree index, and then navigates that tree to answer questions.
Based on VectifyAI/PageIndex. Extended with multi-LLM provider support, Obsidian vault indexing, and a dependency security checker.
What was added over the original
| Feature | Original | This fork |
|---|---|---|
| LLM providers | Anthropic only | Anthropic · OpenAI · Ollama · Claude Code (no API key) |
| Markdown indexing | — | Single file and full vault/folder |
| Obsidian vault support | — | --vault <dir> CLI flag |
| npm security rules | — | npm run security (age + CVE checks) |
| Env-var model override | — | PAGEINDEX_MODEL in .env |
Installation
npm install @keleshteri/pageindex-ragRequires Node.js 18+ (uses built-in fetch).
Quick start
import { PageIndexClient } from '@keleshteri/pageindex-rag';
const client = new PageIndexClient({
model: 'gpt-4o', // or 'claude-sonnet-4-6', 'claude-code', 'ollama/llama3'
workspace: './workspace', // optional: persists indexes to disk
});
// Index a PDF
const docId = await client.index('./report.pdf');
// Index a Markdown file
const noteId = await client.index('./notes.md');
// Index an entire Obsidian vault
const results = await client.indexVault('./MyVault', { concurrency: 3 });
// Retrieve
const structure = client.getDocumentStructure(docId);
const pages = await client.getPageContent(docId, '3-5');LLM options
Set the model in .env or pass it directly to PageIndexClient.
Anthropic
ANTHROPIC_API_KEY=sk-ant-...
PAGEINDEX_MODEL=claude-sonnet-4-6OpenAI
OPENAI_API_KEY=sk-...
PAGEINDEX_MODEL=gpt-4oClaude Code — no API key needed
Uses your local Claude Code session. Requires the claude CLI to be installed and logged in.
PAGEINDEX_MODEL=claude-codeOllama — fully local, no API key needed
ollama pull llama3PAGEINDEX_MODEL=ollama/llama3
# OLLAMA_BASE_URL=http://localhost:11434/v1 # override if neededCLI
# Single PDF
npx ts-node src/cli.ts --pdf report.pdf
# Single Markdown file
npx ts-node src/cli.ts --md notes.md
# Entire Obsidian vault
npx ts-node src/cli.ts --vault ~/Documents/MyVault
# Use a specific model
npx ts-node src/cli.ts --pdf report.pdf --model gpt-4o
# Options
# --no-summary skip per-node summaries (faster)
# --add-description generate a one-sentence doc description
# --add-text include raw page text in the output
# --output <path> custom output path (single-file mode only)Output is saved to results/ as JSON.
Programmatic API
const client = new PageIndexClient(options);
// Index
await client.index(filePath) // auto-detects pdf/md
await client.indexVault(dirPath) // all .md files in a folder
// Retrieve
client.getDocument(docId) // metadata (name, type, page count)
client.getDocumentStructure(docId) // full tree (no raw text)
await client.getPageContent(docId, pages) // pages: "1-3", "5,8", "12"
client.listDocuments() // all indexed docsThe three retrieve functions return JSON strings — they are designed to be passed directly as LLM tool-use callbacks.
Security
npm run security # checks all deps: age (< 14 days → blocked) + npm auditSee scripts/check-deps.js for thresholds and rules.
Development
npm run build # compile TypeScript → dist/
npm run dev # watch modeCopy .env.example to .env and configure your LLM provider before running.
Credits
This project is a fork of VectifyAI/PageIndex, licensed under MIT.
License
MIT — see LICENSE.
