@ranavibe/rag

v1.0.0

Published

12 days ago

Advanced RAG (Retrieval Augmented Generation) for RANA Framework

Downloads

0High
0Medium
0Low

ranavibeai

rana rag retrieval augmented generation llm ai vector search embeddings

@ranavibe/rag

Advanced RAG (Retrieval Augmented Generation) for the RANA Framework.

Features

Intelligent Chunking - Semantic, markdown, code-aware, and recursive chunking
Hybrid Retrieval - Vector + keyword search with fusion strategies
Re-ranking - Cross-encoder, LLM, and diversity-based re-ranking
Synthesis - Refine, tree-summarize, and compact synthesis methods
Citations - Automatic citation tracking and source attribution
React Hooks - Easy integration with React applications
Pipeline Presets - Pre-configured pipelines for common use cases

Installation

npm install @ranavibe/rag

Quick Start

import { createRAGPipeline, RAGPresets } from '@ranavibe/rag';

// Use a preset for quick setup
const pipeline = RAGPresets.balanced();

// Index your documents
await pipeline.index([
  { id: 'doc1', content: 'RANA is an AI development framework...' },
  { id: 'doc2', content: 'RAG enables knowledge-grounded AI...' },
]);

// Query the pipeline
const result = await pipeline.query({
  query: 'What is RANA?',
});

console.log(result.answer);
// "RANA is an AI development framework that..."

console.log(result.citations);
// [{ text: '...', source: 'doc1', score: 0.95 }]

Pipeline Configuration

Custom Pipeline

import { createRAGPipeline } from '@ranavibe/rag';

const pipeline = createRAGPipeline({
  // Chunking strategy
  chunker: {
    type: 'semantic',  // 'semantic' | 'markdown' | 'code' | 'recursive'
    chunkSize: 512,
    overlap: 50,
  },

  // Retrieval strategy
  retriever: {
    type: 'hybrid',  // 'vector' | 'keyword' | 'hybrid'
    topK: 20,
    options: {
      vector: { topK: 20, similarityThreshold: 0.5 },
      keyword: { topK: 10, algorithm: 'bm25' },
      fusion: 'reciprocal-rank-fusion',
    },
  },

  // Re-ranking (optional)
  reranker: {
    type: 'cross-encoder',  // 'cross-encoder' | 'llm' | 'diversity'
    topK: 5,
  },

  // Query transformation (optional)
  queryTransformer: {
    multiQuery: true,
    hypotheticalAnswer: true,  // HyDE
    decompose: true,
  },

  // Synthesis strategy
  synthesizer: {
    type: 'refine',  // 'refine' | 'tree-summarize' | 'compact'
    citations: true,
    streaming: true,
    model: 'claude-sonnet-4',
  },

  // Pipeline options
  config: {
    caching: true,
    metrics: true,
    logging: 'verbose',
  },
});

Presets

import { RAGPresets } from '@ranavibe/rag';

// Fast: Optimized for speed
const fast = RAGPresets.fast();

// Accurate: Optimized for quality
const accurate = RAGPresets.accurate();

// Balanced: Good speed/quality tradeoff
const balanced = RAGPresets.balanced();

// Code: For code search and Q&A
const code = RAGPresets.code('typescript');

// Documentation: For documentation search
const docs = RAGPresets.documentation();

// Research: For research papers
const research = RAGPresets.research();

// Chat: For conversational RAG
const chat = RAGPresets.chat();

Chunking Strategies

Semantic Chunking

Splits text based on semantic boundaries using embedding similarity:

import { SemanticChunker } from '@ranavibe/rag';

const chunker = new SemanticChunker();
const chunks = await chunker.chunk(text, {
  chunkSize: 512,
  overlap: 50,
  similarityThreshold: 0.5,
});

Markdown Chunking

Preserves markdown structure (headers, code blocks, lists):

import { MarkdownChunker } from '@ranavibe/rag';

const chunker = new MarkdownChunker();
const chunks = await chunker.chunk(markdown, {
  chunkSize: 512,
  preserveHeaders: true,
  preserveCodeBlocks: true,
});

Code Chunking

Preserves function and class boundaries:

import { CodeChunker } from '@ranavibe/rag';

const chunker = new CodeChunker();
const chunks = await chunker.chunk(code, {
  language: 'typescript',
  chunkSize: 1024,
  preserveFunctions: true,
  preserveClasses: true,
});

Retrieval Strategies

Hybrid Retrieval

Combines vector and keyword search:

import { HybridRetriever } from '@ranavibe/rag';

const retriever = new HybridRetriever();
await retriever.index(chunks);

const results = await retriever.retrieve(query, {
  vector: { topK: 20 },
  keyword: { topK: 10, algorithm: 'bm25' },
  fusion: 'reciprocal-rank-fusion',  // or 'weighted', 'max'
});

Fusion Strategies

Reciprocal Rank Fusion (RRF): Combines rankings, good for diverse sources
Weighted: Configurable weights for vector vs keyword
Max: Takes highest score from either method

Re-ranking

Cross-Encoder Re-ranking

More accurate than bi-encoder but slower:

import { CrossEncoderReranker } from '@ranavibe/rag';

const reranker = new CrossEncoderReranker();
const reranked = await reranker.rerank(query, results, {
  topK: 5,
  normalize: true,
});

Diversity Re-ranking (MMR)

Maximize relevance while maintaining diversity:

import { DiversityReranker } from '@ranavibe/rag';

const reranker = new DiversityReranker();
const reranked = await reranker.rerank(query, results, {
  topK: 5,
  lambda: 0.5,  // Balance relevance vs diversity
});

Synthesis Methods

Refine Synthesis

Iteratively refines answer with each chunk:

// Good for comprehensive answers
synthesizer: {
  type: 'refine',
  citations: true,
}

Tree Summarization

Hierarchically summarizes in a tree structure:

// Good for many chunks
synthesizer: {
  type: 'tree-summarize',
  citations: true,
}

Compact Synthesis

Single LLM call with all context:

// Fastest, good for small contexts
synthesizer: {
  type: 'compact',
  citations: true,
}

React Integration

Setup

import { RAGProvider, RAGPresets } from '@ranavibe/rag';

const pipeline = RAGPresets.balanced();

function App() {
  return (
    <RAGProvider pipeline={pipeline}>
      <SearchComponent />
    </RAGProvider>
  );
}

useRAG Hook

import { useRAG } from '@ranavibe/rag';

function SearchComponent() {
  const { query, answer, citations, isLoading, error } = useRAG();

  const handleSearch = async (q: string) => {
    await query(q);
  };

  return (
    <div>
      <input onKeyDown={e => e.key === 'Enter' && handleSearch(e.target.value)} />
      {isLoading && <Spinner />}
      {error && <Error message={error.message} />}
      {answer && (
        <>
          <Answer content={answer} />
          <Citations items={citations} />
        </>
      )}
    </div>
  );
}

useRAGStream Hook

import { useRAGStream } from '@ranavibe/rag';

function StreamingSearch() {
  const { queryStream, answer, citations, isStreaming, stop } = useRAGStream();

  return (
    <div>
      <button onClick={() => queryStream('Explain RAG')}>
        Search
      </button>
      {isStreaming && <button onClick={stop}>Stop</button>}
      <div>{answer}</div>
      <Sources items={citations} />
    </div>
  );
}

useRAGIndex Hook

import { useRAGIndex } from '@ranavibe/rag';

function DocumentManager() {
  const { index, deleteDocuments, isIndexing, progress, documentCount } = useRAGIndex();

  const handleUpload = async (files: File[]) => {
    const documents = await Promise.all(
      files.map(async f => ({
        id: f.name,
        content: await f.text(),
      }))
    );
    await index(documents);
  };

  return (
    <div>
      <input type="file" multiple onChange={e => handleUpload(e.target.files)} />
      {isIndexing && <Progress value={progress} />}
      <p>Documents indexed: {documentCount}</p>
    </div>
  );
}

API Reference

RAGPipeline

| Method | Description | |--------|-------------| | query(options) | Execute RAG query | | queryStream(options) | Streaming RAG query | | index(documents) | Index documents | | delete(ids) | Delete documents |

RAGResult

| Property | Type | Description | |----------|------|-------------| | answer | string | Generated answer | | citations | Citation[] | Source citations | | sources | Source[] | Unique sources | | metrics | RAGMetrics | Performance metrics |

RAGMetrics

| Property | Description | |----------|-------------| | latency | Total query time (ms) | | cost | Estimated cost ($) | | chunks.total | Total indexed chunks | | chunks.retrieved | Chunks retrieved | | chunks.used | Chunks used in answer | | tokens.input | Input tokens | | tokens.output | Output tokens |

Best Practices

Choose the right chunker: Use semantic for general text, markdown for docs, code for source files
Tune topK: Start with 10-20 for retrieval, 3-5 after reranking
Use hybrid retrieval: Combines semantic understanding with keyword matching
Enable caching: Reduces cost and latency for repeated queries
Monitor metrics: Track latency, cost, and citation quality
Use streaming: Improves perceived performance for users

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@ranavibe/rag

Features

Installation

Quick Start

Pipeline Configuration

Custom Pipeline

Presets

Chunking Strategies

Semantic Chunking

Markdown Chunking

Code Chunking

Retrieval Strategies

Hybrid Retrieval

Fusion Strategies

Re-ranking

Cross-Encoder Re-ranking

Diversity Re-ranking (MMR)

Synthesis Methods

Refine Synthesis

Tree Summarization

Compact Synthesis

React Integration

Setup

useRAG Hook

useRAGStream Hook

useRAGIndex Hook

API Reference

RAGPipeline

RAGResult

RAGMetrics

Best Practices

License