@hardikdev1210/ragzero

v1.2.0

Published

3 months ago

ragzero-core: embedding-free retrieval engine with hierarchical parsing, bottom-up semantic compression, and LLM-driven query planning.

Downloads

0High
0Medium
0Low

hardikdev1210

rag ragzero ragzero-core llm ollama tree-rag hierarchical agentscope retrieval

ragzero-core

npm license downloads

Turn any website into an intelligent, queryable knowledge system — without embeddings.

A next-generation LLM retrieval engine that eliminates embeddings and vector databases by enabling models to reason directly over structured documents.

Instead of similarity search, ragzero-core uses:

Hierarchical document parsing
Bottom-up semantic compression
LLM-driven query planning

This results in:

Better reasoning over structured data
Lower infrastructure complexity
Fully local or multi-provider LLM support

Designed to run:

Fully local (Ollama)
Cloud (OpenAI-compatible providers: ChatGPT/Claude/Grok via gateway)
Hybrid environments

Why Vectorless?

Traditional RAG:

Requires embeddings
Needs vector databases (Pinecone, FAISS, etc.)
Suffers from retrieval mismatch

ragzero-core:

No embeddings
No vector DB
Uses LLM reasoning instead of similarity

This makes it:

Simpler to deploy
More flexible across models
Better for structured documents (docs, tutorials, knowledge bases)

Architecture

User → Query Planner → Section Selection → Context Builder → LLM → Answer

Pipeline:

Parse HTML → Heading Tree
Summarize bottom-up
Store structured JSON
Plan query (LLM selects sections)
Build context dynamically
Generate answer

1-Minute Example

npx @hardikdev1210/ragzero "https://example.com" "What is this about?"

-> Fetching document...
-> Building structure...
-> Answer: "This page explains ..."

What Makes This Different

Unlike traditional RAG stacks:

No embeddings pipeline
No vector database dependency
LLM acts as retriever and reasoner
Works fully offline with Ollama

This reduces system complexity while improving reasoning quality on structured data.

Comparison

| Feature | Vector DB RAG | ragzero-core | |--------|--------------|----------------| | Embeddings | Required | Not required | | Vector DB | Required | Not required | | Setup complexity | High | Low | | Reasoning ability | Medium | High | | Works offline | Limited | Yes (via Ollama) |

Use Cases

Documentation assistants (ChatGPT-like for docs)
Developer knowledge bases
Internal company wikis
AI copilots for SaaS products
Offline/local AI systems

Who Is This For

Developers building AI copilots
Teams with documentation-heavy products
Engineers avoiding vector DB complexity
Builders creating local/offline AI tools

Example Use Case

ragzero --crawl --max-pages 150 \
  --url "https://doc.agentscope.io/" \
  --question "How do I install AgentScope?"

Answer behavior:

Crawls relevant documentation pages
Merges installation details from multiple sections/pages
Returns a grounded explanation

Performance Notes

Reduces infrastructure by removing vector DB
Faster setup compared to traditional RAG pipelines
Improved contextual understanding on structured documents due to hierarchical reasoning

Security

Input sanitization to reduce prompt injection risk
Local-first storage by default
API keys handled at runtime via env/flags (not persisted in indexed JSON)

Installation

npm install @hardikdev1210/ragzero

Or run without install:

npx @hardikdev1210/ragzero --help

CLI Quick Start

Single page:

ragzero "https://example.com/page" "What is this page about?"

Whole site:

ragzero --crawl --max-pages 200 \
  --url "https://doc.agentscope.io/" \
  --question "How do I install AgentScope and what are extra dependencies?"

Custom model provider:

ragzero \
  --provider custom \
  --base-url "https://api.openai.com/v1" \
  --api-key "<YOUR_API_KEY>" \
  --model "gpt-4o-mini" \
  --url "https://example.com/docs" \
  --question "Summarize the onboarding flow"

CLI Reference

| Option | Short | Description | |--------|-------|-------------| | --url | -u | HTML page URL to fetch and index | | --question | -q | Natural-language question | | --force | | Ignore saved index; re-fetch and re-summarize | | --crawl | | Crawl internal pages from seed URL and query across site index | | --max-pages | | Crawl limit in site mode (default 100) | | --data-dir | -d | Root folder for stored JSON (overrides env) | | --provider | | LLM provider: ollama or custom | | --base-url | | Base URL for custom provider (.../v1) | | --api-key | | API key for custom provider | | --model | -m | Model name | | --json | | Print one JSON object to stdout | | --help | -h | Show usage |

Programmatic API

import { VectorlessRAG } from "@hardikdev1210/ragzero";

const rag = new VectorlessRAG({
  provider: "ollama",
  model: "llama3.2",
  ollamaHost: "http://127.0.0.1:11434",
  dataDir: "./my-index",
  verbose: true
});

await rag.load("https://example.com/docs");
const answer = await rag.ask("What is the main topic?");

Site mode:

await rag.loadSite("https://doc.agentscope.io/", { maxPages: 200 });
const siteAnswer = await rag.askSite("How does installation work across pages?");

Custom provider:

const rag = new VectorlessRAG({
  provider: "custom",
  baseURL: "https://api.openai.com/v1",
  apiKey: process.env.LLM_API_KEY,
  model: "gpt-4o-mini"
});

Note: Native Anthropic/XAI APIs use different schemas. For Claude/Grok, use an OpenAI-compatible gateway/router endpoint.

Environment Variables

| Variable | Purpose | Default | |----------|---------|---------| | LLM_PROVIDER | ollama or custom | ollama | | LLM_BASE_URL | Base URL for custom provider (.../v1) | unset | | LLM_API_KEY | API key for custom provider | unset | | LLM_MODEL | Model name for any provider | fallback to OLLAMA_MODEL | | OLLAMA_HOST | Ollama base URL | http://127.0.0.1:11434 | | OLLAMA_MODEL | Ollama model fallback | llama3.2 | | VECTORLESS_DATA_DIR | Root for persisted JSON | ./data/vectorless-rag |

Storage Layout

<cwd>/data/vectorless-rag/
  documents/<docId>.json
  sites/<siteId>.json
  llm-cache.json

How It Works (Deep Dive)

Fetch HTML and parse heading tree (h1 to h6)
Sanitize section text
Summarize leaves and then parent nodes bottom-up
Persist per-page trees and optional site index
Query planner selects relevant nodes/docs
Context builder composes source text
LLM generates grounded answer

SaaS / Chat Integration Flow

Recommended backend flow:

On workspace setup, run loadSite(seedUrl, { maxPages })
Persist { workspaceId, siteId } in your DB
For each user message, run askSite(question)
Re-index asynchronously with forceRefresh: true

Suggested endpoints:

POST /knowledge/index
POST /chat
GET /knowledge/status

Roadmap

[ ] Streaming responses
[ ] Better source citations in final answers
[ ] Plugin ingest adapters (PDF, Notion, GitHub)
[ ] Lightweight UI dashboard
[ ] Benchmark suite vs vector-based RAG

Development

npm install
npm start
npm run test:rag

Package publish target:

npm pack --dry-run
npm publish --access public

License

MIT