npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

quick-rag

v2.5.2

Published

Production-ready RAG (Retrieval-Augmented Generation) for JavaScript & React - Built on official Ollama & LM Studio SDKs with Hybrid Search, Reranking, Query Transformation, Caching, Conversation Management & Evaluation

Readme

Quick RAG ⚡

npm version License: MIT

🚀 Production-ready RAG (Retrieval-Augmented Generation) for JavaScript & React
Built on official Ollama & LM Studio SDKs.

🎉 v2.5.2 Released! React export fixes, deterministic init tests, and Ollama base model alignment (granite4:3b). See CHANGELOG.md for details.

✨ Features

🆕 v2.5.2 - Stability & Compatibility

  • React Export Fix - quick-rag/react now resolves correctly to useRAG
  • Deterministic initRAG Tests - core test suite no longer depends on external Ollama availability
  • Ollama Base Model Alignment - examples and tests standardized to granite4:3b
  • 🐛 Critical Bug Fix - ConversationManager.addAssistantMessage() now correctly passes content
  • 🌐 Browser Compatibility - Cross-platform UUID generation (Node.js + Browser)
  • 📦 Cleaner Dependencies - Removed invalid self-referencing dependency
  • 🤖 Updated Default Models - qwen3-embedding:0.6b (Ollama) & google/gemma-3-4b (LM Studio)

v2.4.0 - Robustness & Explainability

  • 🔪 Robust Chunking - Abbreviation-aware sentence splitting & word-safe text chunking
  • 🔍 Rich Explainability - Detailed retrieval snippets, keyword density & term match metrics
  • 🚀 BM25 Optimization - Min-Heap based top-K selection for fast retrieval in large datasets
  • 🌐 Environment Stability - Universal UUID support for Node.js and Browser (globalThis.crypto)

v2.3.0 - Performance & Evaluation

  • 🚀 Caching Layer - LRU cache, embedding cache, query cache for 10x speedup
  • 💬 Conversation Manager - Context window management & auto-summarization
  • 📊 RAG Evaluation - Precision@K, Recall, MRR, NDCG metrics
  • 🗄️ Vector DB Connectors - ChromaDB & Qdrant adapters

🔍 v2.2.0 - Advanced Search

  • 🔍 BM25 Sparse Search - Pure JS keyword-based retrieval (no dependencies!)
  • 🔀 Hybrid Search - Combines BM25 + Vector with RRF fusion (20-30% better retrieval)
  • 📊 Reranking - Multi-signal scoring (keyword, semantic, coverage, coherence)
  • 🔄 Query Transformation - Expansion, decomposition, multi-query, HyDE

Core Features

  • 🎯 Official SDKs - Built on ollama and @lmstudio/sdk packages
  • 💾 Embedded Persistence - SQLite-based vector store (No server required!)
  • 🛡️ Robust Error Handling - 7 custom error classes with recovery suggestions
  • 📊 Telemetry & Metrics - Track performance, latency, and usage
  • 📝 Structured Logging - JSON logging with Pino integration
  • 5x Faster - Parallel batch embedding
  • 📄 Document Loaders - PDF, Word, Excel, Text, Markdown, URLs
  • 🔪 Robust Chunking - Intelligent splitting that respects abbreviations (Dr., Prof.) and avoids word cutting
  • 🏷️ Metadata Filtering - Filter by document properties
  • 🔍 Rich Query Explainability - See WHY docs were retrieved with snippets and density metrics (unique!)
  • 🎨 Dynamic Prompts - 10 built-in templates + full customization
  • 🧠 Weighted Decision Making - Multi-criteria document scoring
  • 🎯 Heuristic Reasoning - Pattern learning and query optimization
  • 🔄 CRUD Operations - Add, update, delete documents on the fly
  • 🌊 Streaming Support - Real-time AI responses
  • 🔧 Zero Config - Works with React, Next.js, Vite, Node.js
  • 💪 Type Safe - Full TypeScript support

📦 Installation

npm install quick-rag

Default Ollama models (examples/docs):

ollama pull granite4:3b
ollama pull qwen3-embedding:0.6b

Optional Dependencies:

# For embedded persistence
npm install better-sqlite3

# For vector databases (optional)
npm install chromadb @qdrant/js-client-rest

🆕 What's New in v2.3.0

🚀 Caching Layer

Speed up repeated operations with intelligent caching:

import { CacheManager, EmbeddingCache } from 'quick-rag';

// Unified cache manager
const cache = new CacheManager({
  embeddings: { maxSize: 5000, ttl: 3600000 }, // 1 hour
  queries: { maxSize: 500, ttl: 1800000 }      // 30 min
});

// Wrap embedding function for automatic caching
const cachedEmbed = cache.wrapEmbedding(embedFn);

// Check statistics
console.log(cache.getStats());
// { embeddings: { size: 100, cacheHits: 450, cacheMisses: 50, hitRate: 0.9 } }

💬 Conversation Manager

Manage chat history with context window limits:

import { ConversationManager, getContextLimit } from 'quick-rag';

const conversation = new ConversationManager({
  maxTokens: getContextLimit('llama3'), // 8192
  autoSummarize: true,
  systemPrompt: 'You are a helpful assistant.'
});

conversation.addMessage('user', 'What is RAG?');
conversation.addMessage('assistant', 'RAG stands for...');

// Get context for LLM (respects token limits)
const context = conversation.getContext();

// Fork, export, or summarize
const forked = conversation.fork();
const json = conversation.toJSON();

📊 RAG Evaluation

Measure retrieval quality with standard metrics:

import { precisionAtK, meanReciprocalRank, RAGEvaluator } from 'quick-rag';

// Individual metrics
const retrieved = ['doc1', 'doc4', 'doc2'];
const relevant = ['doc1', 'doc2', 'doc3'];

console.log(precisionAtK(retrieved, relevant, 3));  // 0.667
console.log(meanReciprocalRank(retrieved, relevant)); // 1.0

// Full evaluation
const evaluator = new RAGEvaluator(retriever);
const results = await evaluator.evaluate(testQueries);
console.log(results.metrics); // { precision, recall, mrr, ndcg }

🗄️ Vector Database Connectors

Connect to external vector databases:

import { createVectorStore, ChromaVectorStore, QdrantVectorStore } from 'quick-rag';

// Factory pattern
const store = await createVectorStore('chroma', embedFn, {
  collectionName: 'my-docs',
  host: 'localhost',
  port: 8000
});

// Or direct usage
const qdrant = new QdrantVectorStore(embedFn, {
  url: 'http://localhost:6333',
  collectionName: 'documents'
});

🆕 What's New in v2.4.0

🔪 Robust Chunking

Intelligent text splitting that handles abbreviations and prevents word splitting:

import { chunkBySentences, chunkText } from 'quick-rag';

// Handles Dr., Prof., LTD., approx., etc.
const chunks = chunkBySentences(text, { 
  sentencesPerChunk: 3,
  overlapSentences: 1 
});

// Avoids cutting words in half
const textChunks = chunkText(text, { 
  chunkSize: 500,
  overlap: 50,
  separator: ' ' // Word-safe splitting
});

🔍 Rich Query Explainability

Get deep insights into why a document was retrieved:

const results = await retriever.getRelevant(query, 3, { explain: true });

console.log(results[0].explanation);
/*
{
  score: 0.88,
  snippet: "...context surrounding the match...",
  relevanceFactors: {
    semanticScore: 0.88,
    termMatch: 0.75,   // 3/4 terms matched
    density: 0.15      // concentration of keywords
  }
}
*/

🔍 What's in v2.2.0

🔍 BM25 Sparse Search

Pure JavaScript implementation - no external dependencies!

import { BM25 } from 'quick-rag';

const bm25 = new BM25({ k1: 1.2, b: 0.75 });
bm25.addDocuments([
  { id: '1', text: 'Machine learning is a subset of AI' },
  { id: '2', text: 'Deep learning uses neural networks' },
  { id: '3', text: 'Natural language processing handles text' }
]);

const results = bm25.search('neural networks AI', 2);
// Fast keyword-based retrieval with TF-IDF scoring

🔀 Hybrid Search (BM25 + Vector)

Combine sparse and dense retrieval for 20-30% better results!

import { HybridRetriever, InMemoryVectorStore } from 'quick-rag';

const vectorStore = new InMemoryVectorStore(embedFn);
await vectorStore.addDocuments(docs);

const hybrid = new HybridRetriever(vectorStore, {
  alpha: 0.5,           // Balance: 0=sparse only, 1=dense only
  fusionMethod: 'rrf',  // Reciprocal Rank Fusion
  rrfK: 60
});

const results = await hybrid.search('query', 5, { explain: true });
// Results include both dense and sparse scores

📊 Reranking

Multi-signal scoring to improve top-K precision:

import { Reranker, createRerankedRetriever } from 'quick-rag';

const reranker = new Reranker({
  keywordWeight: 0.35,   // Keyword overlap
  semanticWeight: 0.35,  // Semantic similarity
  coverageWeight: 0.20,  // Query term coverage
  coherenceWeight: 0.10  // Text coherence
});

// Rerank any retriever's results
const reranked = reranker.rerank(query, initialResults, { explain: true });

// Or wrap a retriever for automatic reranking
const smartRetriever = createRerankedRetriever(hybridRetriever, rerankerOptions);

🔄 Query Transformation

Advanced query processing techniques:

import { QueryExpander, QueryDecomposer, MultiQueryGenerator } from 'quick-rag';

// 1. Query Expansion - Add synonyms
const expander = new QueryExpander();
expander.addSynonyms('ml', ['machine learning', 'AI']);
const expanded = expander.expand('ml models');
// "ml models machine learning AI"

// 2. Query Decomposition - Split complex queries
const decomposer = new QueryDecomposer();
const parts = decomposer.decompose('Compare BM25 with vector search and explain differences');
// ["Compare BM25 with vector search", "explain differences"]

// 3. Multi-Query - Generate variations
const generator = new MultiQueryGenerator();
const variations = generator.generate('How does RAG work?');
// ["How does RAG work?", "What is RAG?", "RAG explanation"]

🎯 Full Pipeline Example

Combine all features for maximum retrieval quality:

import {
  OllamaRAGClient,
  createOllamaRAGEmbedding,
  InMemoryVectorStore,
  HybridRetriever,
  createRerankedRetriever,
  QueryExpander,
  generateWithRAG
} from 'quick-rag';

// Setup
const client = new OllamaRAGClient();
const embed = createOllamaRAGEmbedding(client, 'nomic-embed-text');
const store = new InMemoryVectorStore(embed);
await store.addDocuments(documents);

// Create hybrid + reranked retriever
const hybrid = new HybridRetriever(store, { alpha: 0.5, fusionMethod: 'rrf' });
const retriever = createRerankedRetriever(hybrid, { keywordWeight: 0.3 });

// Expand query and retrieve
const expander = new QueryExpander();
const { expanded } = expander.expand(userQuery);
const results = await retriever.getRelevant(expanded, 5);

// Generate response
const response = await generateWithRAG(client, 'llama3', userQuery, results);

📚 Previous Features

💾 Embedded Persistence (v2.1.0)

Store your vectors locally without setting up a complex database server!

  • Zero Setup: Just provide a file path (./rag.db)
  • Fast: Built on better-sqlite3
  • Full Features: Batch insert, metadata filtering, CRUD

🛡️ Advanced Error Handling

Never crash without knowing why. New error system provides:

  • Specific Error Types: RAGError, EmbeddingError, RetrievalError, etc.
  • Error Codes: Programmatic handling
  • Recovery Hints: Actionable suggestions in error messages

📊 Metrics & Logging

Monitor your RAG pipeline in production:

  • Performance Tracking: Embedding time, search latency, generation speed
  • Structured Logs: JSON format for easy parsing
  • Prometheus Support: Export metrics for monitoring dashboards Advanced filtering with custom logic - filter documents using JavaScript functions:
const results = await retriever.getRelevant('latest AI news', 5, {
  filter: (meta) => {
    return meta.year === 2024 && 
           meta.tags.includes('AI') &&
           meta.difficulty !== 'beginner';
  }
});

📽️ PowerPoint Support

Load .pptx and .ppt files with officeparser:

import { loadDocument } from 'quick-rag';
const pptDoc = await loadDocument('./presentation.pptx');

📁 Organized Examples

12 comprehensive examples covering all features:

  • Basic Usage (Ollama & LM Studio)
  • Document Loading (PDF, Word, Excel)
  • Metadata Filtering
  • Streaming Responses
  • Advanced Filtering
  • Query Explainability
  • Prompt Management
  • Decision Engine (Simple & Real-World)
  • Conversation History & Export

🆕 Previous Features (v1.1.x)

📝 Internationalization Update

  • Translated all example files to English for better international accessibility
  • example/10-decision-engine-simple.js - Smart Document Selection example
  • example/11-decision-engine-pdf-real-world.js - Real-world PDF scenario example

🧠 Decision Engine (v1.1.0)

Revolutionary AI-powered retrieval system - The most advanced RAG retrieval available!

Quick RAG now includes a Decision Engine that goes far beyond simple cosine similarity. It combines:

  • 🎯 Multi-Criteria Weighted Scoring - 5 factors evaluated together
  • 🧠 Heuristic Reasoning - Pattern-based query optimization
  • Adaptive Learning - Learns from user feedback
  • �🔍 Full Transparency - See exactly why each document was selected

Multi-Criteria Scoring

5 weighted factors beyond similarity:

  1. 📊 Semantic Similarity (50%) - Cosine similarity score
  2. 🔤 Keyword Match (20%) - Term matching in document
  3. 📅 Recency (15%) - Document freshness with exponential decay
  4. ⭐ Source Quality (10%) - Source reliability (official=1.0, research=0.9, blog=0.7, forum=0.6)
  5. 🎯 Context Relevance (5%) - Contextual fit
import { SmartRetriever, DEFAULT_WEIGHTS } from 'quick-rag';

// Create smart retriever with default weights
const smartRetriever = new SmartRetriever(basicRetriever);

// Or customize weights for your use case
const smartRetriever = new SmartRetriever(basicRetriever, {
  weights: {
    semanticSimilarity: 0.35,
    keywordMatch: 0.20,
    recency: 0.30,         // Higher for news sites
    sourceQuality: 0.10,
    contextRelevance: 0.05
  }
});

// Get results with decision transparency
const response = await smartRetriever.getRelevant('latest AI news', 3);

// See scoring breakdown for each document
console.log(response.results[0]);
// {
//   text: "...",
//   weightedScore: 0.742,
//   scoreBreakdown: {
//     semanticSimilarity: { score: 0.85, weight: 0.35, contribution: 0.298 },
//     keywordMatch: { score: 0.67, weight: 0.20, contribution: 0.134 },
//     recency: { score: 0.95, weight: 0.30, contribution: 0.285 },
//     sourceQuality: { score: 0.90, weight: 0.10, contribution: 0.090 },
//     contextRelevance: { score: 1.00, weight: 0.05, contribution: 0.050 }
//   }
// }

// Decision context shows WHY these results
console.log(response.decisions);
// {
//   weights: { ... },
//   appliedRules: ["boost-recent-for-news"],
//   suggestions: [
//     "Time-sensitive query detected. Prioritizing recent documents.",
//     "Consider using filters if you need older historical content."
//   ]
// }

Heuristic Reasoning

Pattern-based optimization that learns:

// Enable learning mode
const smartRetriever = new SmartRetriever(basicRetriever, {
  enableLearning: true,
  enableHeuristics: true
});

// Add custom rules
smartRetriever.heuristicEngine.addRule(
  'boost-documentation',
  (query, context) => query.includes('documentation'),
  (query, context) => {
    context.adjustWeight('sourceQuality', 0.15);  // Increase quality weight
    return { adjusted: true, reason: 'Documentation query prioritizes quality' };
  },
  5  // Priority
);

// Provide feedback to enable learning
smartRetriever.provideFeedback(query, results, {
  rating: 5,           // 1-5 rating
  hasFilters: true,    // User applied filters
  comment: 'Perfect results!'
});

// System learns successful patterns
const insights = smartRetriever.getInsights();
console.log(insights.heuristics.successfulPatterns);
// ["latest", "documentation", "official release"]

// Export learned knowledge
const knowledge = smartRetriever.exportKnowledge();

// Import to another instance
newRetriever.importKnowledge(knowledge);

Scenario Customization

Different weights for different use cases:

// News Platform - Recency Priority
const newsRetriever = new SmartRetriever(basicRetriever, {
  weights: {
    semanticSimilarity: 0.30,
    keywordMatch: 0.20,
    recency: 0.40,         // 🔥 High recency
    sourceQuality: 0.05,
    contextRelevance: 0.05
  }
});

// Documentation Site - Quality Priority  
const docsRetriever = new SmartRetriever(basicRetriever, {
  weights: {
    semanticSimilarity: 0.35,
    keywordMatch: 0.20,
    recency: 0.10,
    sourceQuality: 0.30,   // 🔥 High quality
    contextRelevance: 0.05
  }
});

// Research Platform - Balanced
const researchRetriever = new SmartRetriever(basicRetriever, {
  weights: DEFAULT_WEIGHTS  // Balanced approach
});

Real-World Example

See example/11-decision-engine-pdf-real-world.js for a complete example with:

  • PDF document loading
  • Multiple source types (official, blog, research, forum)
  • 3 different scenarios (news, documentation, research)
  • RAG generation with quality metrics
  • Decision transparency and explanations

Benefits:

  • ✅ More accurate retrieval than pure similarity
  • ✅ Adapts to different content types automatically
  • ✅ Learns from user interactions
  • ✅ Fully explainable decisions
  • ✅ Customizable for any use case
  • ✅ Production-ready with proven patterns

🔍 Query Explainability (v1.1.0)

Understand WHY documents were retrieved - A first-of-its-kind feature!

const results = await retriever.getRelevant('What is Ollama?', 3, {
  explain: true
});

// Each result includes detailed explanation:
console.log(results[0].explanation);
// {
//   queryTerms: ["ollama", "local", "ai"],
//   matchedTerms: ["ollama", "local"],
//   matchCount: 2,
//   matchRatio: 0.67,
//   cosineSimilarity: 0.856,
//   relevanceFactors: {
//     termMatches: 2,
//     semanticSimilarity: 0.856,
//     coverage: "67%"
//   }
// }

Use cases: Debug searches, optimize queries, validate accuracy, explain to users

🎨 Dynamic Prompt Management (v1.1.0)

10 built-in templates + full customization

// Quick template selection
await generateWithRAG(client, model, query, docs, {
  template: 'conversational'  // or: technical, academic, code, etc.
});

// System prompts for role definition
await generateWithRAG(client, model, query, docs, {
  systemPrompt: 'You are a helpful programming tutor',
  template: 'instructional'
});

// Advanced: Reusable PromptManager
import { createPromptManager } from 'quick-rag';

const promptMgr = createPromptManager({
  systemPrompt: 'You are an expert engineer',
  template: 'technical'
});

await generateWithRAG(client, model, query, docs, {
  promptManager: promptMgr
});

Templates: default, conversational, technical, academic, code, concise, detailed, qa, instructional, creative


🚀 Quick Start

Option 1: With Official Ollama SDK (Recommended)

import { 
  OllamaRAGClient, 
  createOllamaRAGEmbedding,
  InMemoryVectorStore, 
  Retriever 
} from 'quick-rag';

// 1. Initialize client (official SDK)
const client = new OllamaRAGClient({
  host: 'http://127.0.0.1:11434'
});

// 2. Setup embedding
const embed = createOllamaRAGEmbedding(client, 'qwen3-embedding:0.6b');

// 3. Create vector store
const vectorStore = new InMemoryVectorStore(embed);
const retriever = new Retriever(vectorStore);

// 4. Add documents
await vectorStore.addDocument({ 
  text: 'Ollama provides local LLM hosting.' 
});

// 5. Query with streaming (official SDK feature!)
const results = await retriever.getRelevant('What is Ollama?', 2);
const context = results.map(d => d.text).join('\n');

const response = await client.chat({
  model: 'granite4:3b',
  messages: [{ 
    role: 'user', 
    content: `Context: ${context}\n\nQuestion: What is Ollama?` 
  }],
  stream: true, // Official SDK streaming!
});

// Stream response
for await (const part of response) {
  process.stdout.write(part.message?.content || '');
}

Option 2: React with Vite

💡 Starting from scratch? Check out the detailed step-by-step guide in QUICKSTART_REACT.md!

Step 1: Create your project

npm create vite@latest my-rag-app -- --template react
cd my-rag-app
npm install quick-rag express concurrently

Step 2: Create backend proxy (server.js in project root)

import express from 'express';
import { OllamaRAGClient } from 'quick-rag';

const app = express();
app.use(express.json());

const client = new OllamaRAGClient({ host: 'http://127.0.0.1:11434' });

app.post('/api/generate', async (req, res) => {
  const { model = 'granite4:3b', messages } = req.body;
  const response = await client.chat({ model, messages, stream: false });
  res.json({ response: response.message.content });
});

app.post('/api/embed', async (req, res) => {
  const { model = 'qwen3-embedding:0.6b', input } = req.body;
  const response = await client.embed(model, input);
  res.json(response);
});

app.listen(3001, () => console.log('🚀 Server: http://127.0.0.1:3001'));

Step 3: Configure Vite proxy (vite.config.js)

import { defineConfig } from 'vite';
import react from '@vitejs/plugin-react';

export default defineConfig({
  plugins: [react()],
  server: {
    proxy: {
      '/api': {
        target: 'http://127.0.0.1:3001',
        changeOrigin: true
      }
    }
  }
});

Step 4: Update package.json scripts

{
  "scripts": {
    "dev": "concurrently \"npm:server\" \"npm:client\"",
    "server": "node server.js",
    "client": "vite"
  }
}

Step 5: Use in your React component (src/App.jsx)

import { useState, useEffect } from 'react';
import { useRAG, initRAG, createBrowserModelClient } from 'quick-rag';

const docs = [
  { id: '1', text: 'React is a JavaScript library for building user interfaces.' },
  { id: '2', text: 'Ollama provides local LLM hosting.' },
  { id: '3', text: 'RAG combines retrieval with AI generation.' }
];

export default function App() {
  const [rag, setRAG] = useState(null);
  const [query, setQuery] = useState('');
  
  const { run, loading, response, docs: results } = useRAG({
    retriever: rag?.retriever,
    modelClient: createBrowserModelClient(),
    model: 'granite4:3b'
  });

  useEffect(() => {
    initRAG(docs, {
      baseEmbeddingOptions: {
        useBrowser: true,
        baseUrl: '/api/embed',
        model: 'qwen3-embedding:0.6b'
      }
    }).then(core => setRAG(core));
  }, []);

  return (
    <div style={{ padding: 40 }}>
      <h1>🤖 RAG Demo</h1>
      <input 
        value={query} 
        onChange={e => setQuery(e.target.value)}
        placeholder="Ask something..."
        style={{ width: 300, padding: 10 }}
      />
      <button onClick={() => run(query)} disabled={loading}>
        {loading ? 'Thinking...' : 'Ask AI'}
      </button>
      
      {results && (
        <div>
          <h3>📚 Retrieved:</h3>
          {results.map(d => <p key={d.id}>{d.text}</p>)}
        </div>
      )}
      
      {response && (
        <div>
          <h3>✨ Answer:</h3>
          <p>{response}</p>
        </div>
      )}
    </div>
  );
}

Step 6: Run your app

npm run dev

Open http://localhost:5173 🎉


Option 2: Next.js (Pages Router)

Step 1: Create API routes

// pages/api/generate.js
import { OllamaClient } from 'quick-rag';

export default async function handler(req, res) {
  const client = new OllamaClient();
  const { model = 'granite4:3b', prompt } = req.body;
  const response = await client.generate(model, prompt);
  res.json({ response });
}
// pages/api/embed.js
import { OllamaClient } from 'quick-rag';

export default async function handler(req, res) {
  const client = new OllamaClient();
  const { model = 'qwen3-embedding:0.6b', input } = req.body;
  const response = await client.embed(model, input);
  res.json(response);
}

Step 2: Use in your page (same React component as above)


Option 3: Vanilla JavaScript (Node.js)

Simple approach with official Ollama SDK:

import { 
  OllamaRAGClient, 
  createOllamaRAGEmbedding, 
  InMemoryVectorStore, 
  Retriever 
} from 'quick-rag';

// 1. Initialize client
const client = new OllamaRAGClient();

// 2. Setup embedding
const embed = createOllamaRAGEmbedding(client, 'qwen3-embedding:0.6b');

// 3. Create vector store and retriever
const vectorStore = new InMemoryVectorStore(embed);
const retriever = new Retriever(vectorStore);

// 4. Add documents
await vectorStore.addDocuments([
  { text: 'JavaScript is a programming language.' },
  { text: 'Python is great for data science.' },
  { text: 'Rust is a systems programming language.' }
]);

// 5. Query
const query = 'What is JavaScript?';
const results = await retriever.getRelevant(query, 2);

// 6. Generate answer
const context = results.map(d => d.text).join('\n');
const response = await client.chat({
  model: 'granite4:3b',
  messages: [{ 
    role: 'user', 
    content: `Context:\n${context}\n\nQuestion: ${query}\n\nAnswer:` 
  }]
});

// Clean output
console.log('📚 Retrieved:', results.map(d => d.text));
console.log('🤖 Answer:', response.message.content);

Output:

📚 Retrieved: [
  'JavaScript is a programming language.',
  'Python is great for data science.'
]
🤖 Answer: JavaScript is a programming language that allows developers 
to write code and implement functionality in web browsers...

Option 4: LM Studio 🎨

Use LM Studio instead of Ollama with OpenAI-compatible API:

import { 
  LMStudioRAGClient, 
  createLMStudioRAGEmbedding, 
  InMemoryVectorStore, 
  Retriever, 
  generateWithRAG 
} from 'quick-rag';

// 1. Initialize LM Studio client
const client = new LMStudioRAGClient();

// 2. Setup embedding (use your embedding model from LM Studio)
const embed = createLMStudioRAGEmbedding(client, 'nomic-embed-text-v1.5');

// 3. Create vector store and retriever
const vectorStore = new InMemoryVectorStore(embed);
const retriever = new Retriever(vectorStore);

// 4. Add documents
await vectorStore.addDocuments([
  { text: 'LM Studio is a desktop app for running LLMs locally.' },
  { text: 'It provides an OpenAI-compatible API.' },
  { text: 'You can use models like Llama, Mistral, and more.' }
]);

// 5. Query with RAG
const results = await retriever.getRelevant('What is LM Studio?', 2);
const answer = await generateWithRAG(
  client,
  'google/gemma-3-4b', // or your model name
  'What is LM Studio?',
  results
);

console.log('Answer:', answer);

Prerequisites for LM Studio:

  1. Download and install LM Studio
  2. Download a language model (e.g., Llama 3.2, Mistral)
  3. Download an embedding model (e.g., nomic-embed-text)
  4. Start the local server: Developer > Local Server (default: http://localhost:1234)

For React projects: Import from 'quick-rag/react' to use hooks:

import { useRAG } from 'quick-rag/react';
// or
import { useRAG } from 'quick-rag'; // Also works in React projects

📖 API Reference

React Hook: useRAG

const { run, loading, response, docs, streaming, error } = useRAG({
  retriever,        // Retriever instance
  modelClient,      // Model client (OllamaClient or BrowserModelClient)
  model            // Model name (e.g., 'granite4:3b')
});

// Ask a question
await run('What is React?');

// With options
await run('What is React?', {
  topK: 5,           // Number of documents to retrieve
  stream: true,      // Enable streaming
  onDelta: (chunk, fullText) => console.log(chunk)
});

Core Functions

Initialize RAG

const { retriever, store, mrl } = await initRAG(documents, {
  defaultDim: 128,              // Embedding dimension
  k: 2,                         // Default number of results
  mrlBaseDim: 768,             // Base embedding dimension
  baseEmbeddingOptions: {
    useBrowser: true,           // Use browser-safe fetch
    baseUrl: '/api/embed',      // Embedding endpoint
    model: 'qwen3-embedding:0.6b'    // Embedding model
  }
});

Generate with RAG

const result = await generateWithRAG({
  retriever,
  modelClient,
  model,
  query: 'Your question',
  topK: 3              // Optional: override default k
});

// Returns: { docs, response, prompt }

VectorStore API

const store = new InMemoryVectorStore(embeddingFn, { defaultDim: 128 });

// Add documents
await store.addDocument({ id: '1', text: 'Document text' });

// Add multiple documents with batch processing (v2.0.3!)
await store.addDocuments([{ id: '1', text: '...' }], { 
  dim: 128,
  batchSize: 20,        // Process 20 chunks at a time
  maxConcurrent: 5,     // Max 5 concurrent requests
  onProgress: (current, total) => {
    console.log(`Progress: ${current}/${total}`);
  }
});

// Query
const results = await store.similaritySearch('query', k, queryDim);

// CRUD
const doc = store.getDocument('id');
const all = store.getAllDocuments();
await store.updateDocument('id', 'new text', { meta: 'data' });
store.deleteDocument('id');
store.clear();

Batch Processing for Large Documents (v2.0.3):

// Process large PDFs efficiently
const chunks = chunkDocuments([largePDF], { chunkSize: 1000, overlap: 100 });

await store.addDocuments(chunks, {
  batchSize: 20,        // Process 20 chunks per batch
  maxConcurrent: 5,     // Max 5 concurrent embedding requests
  onProgress: (current, total) => {
    console.log(`Embedding progress: ${current}/${total} (${Math.round(current/total*100)}%)`);
  }
});

Model Clients

Browser (with proxy)

const client = createBrowserModelClient({
  endpoint: '/api/generate'  // Your proxy endpoint
});

Node.js (direct)

const client = new OllamaClient({
  baseUrl: 'http://127.0.0.1:11434/api'
});

💡 Examples

CRUD Operations

// Add document dynamically
await store.addDocument({ 
  id: 'new-doc', 
  text: 'TypeScript adds types to JavaScript.' 
});

// Add multiple documents with batch processing (v2.0.3!)
await store.addDocuments([
  { id: 'doc1', text: 'First document' },
  { id: 'doc2', text: 'Second document' }
], {
  batchSize: 10,        // Process in batches
  maxConcurrent: 5,     // Rate limiting
  onProgress: (current, total) => {
    console.log(`Added ${current}/${total} documents`);
  }
});

// Update existing
await store.updateDocument('1', 'React 19 is the latest version.', {
  version: '19',
  updated: Date.now()
});

// Delete
store.deleteDocument('2');

// Query all
const allDocs = store.getAllDocuments();
console.log(`Total documents: ${allDocs.length}`);

Dynamic Retrieval

// Ask with different topK values
const result1 = await run('What is JavaScript?', { topK: 1 }); // Get 1 doc
const result2 = await run('What is JavaScript?', { topK: 5 }); // Get 5 docs

Streaming Responses

await run('Explain React hooks', {
  stream: true,
  onDelta: (chunk, fullText) => {
    console.log('New chunk:', chunk);
    // Update UI in real-time
  }
});

Custom Embedding Models

// Use different embedding models
const rag = await initRAG(docs, {
  baseEmbeddingOptions: {
    useBrowser: true,
    baseUrl: '/api/embed',
    model: 'nomic-embed-text'  // or 'mxbai-embed-large', etc.
  }
});

More examples: Check the example/ folder for complete demos.


📄 Document Loaders (v0.7.4+)

Load documents from various formats and use them with RAG!

Supported Formats

| Format | Function | Requires | |--------|----------|----------| | PDF | loadPDF() | npm install pdf-parse | | Word (.docx) | loadWord() | npm install mammoth | | Excel (.xlsx) | loadExcel() | npm install xlsx | | Text (.txt) | loadText() | Built-in ✅ | | JSON | loadJSON() | Built-in ✅ | | Markdown | loadMarkdown() | Built-in ✅ | | Web URLs | loadURL() | Built-in ✅ |

Quick Start

Load PDF:

import { loadPDF, chunkDocuments } from 'quick-rag';

// Load PDF
const pdf = await loadPDF('./document.pdf');
console.log(`Loaded ${pdf.meta.pages} pages`);

// Chunk and add to RAG
const chunks = chunkDocuments([pdf], { 
  chunkSize: 500, 
  overlap: 50 
});
await store.addDocuments(chunks);

Load from URL:

import { loadURL } from 'quick-rag';

const doc = await loadURL('https://example.com', {
  extractText: true  // Convert HTML to plain text
});
await store.addDocuments([doc]);

Load Directory:

import { loadDirectory } from 'quick-rag';

// Load all supported documents from a folder
const docs = await loadDirectory('./documents', {
  extensions: ['.pdf', '.docx', '.txt', '.md'],
  recursive: true
});

console.log(`Loaded ${docs.length} documents`);

// Chunk and add to vector store
const chunks = chunkDocuments(docs, { chunkSize: 500 });
await store.addDocuments(chunks);

Auto-Detect Format:

import { loadDocument } from 'quick-rag';

// Automatically detects file type
const doc = await loadDocument('./file.pdf');
// Works with: .pdf, .docx, .xlsx, .txt, .md, .json

Installation

# Core package (includes text, JSON, markdown, URL loaders)
npm install quick-rag

# Optional: PDF support
npm install pdf-parse

# Optional: Word support
npm install mammoth

# Optional: Excel support
npm install xlsx

# Or install all at once:
npm install quick-rag pdf-parse mammoth xlsx

Complete Example

import {
  loadPDF,
  loadDirectory,
  chunkDocuments,
  InMemoryVectorStore,
  Retriever,
  OllamaRAGClient,
  createOllamaRAGEmbedding,
  generateWithRAG
} from 'quick-rag';

// Load documents
const pdf = await loadPDF('./research.pdf');
const docs = await loadDirectory('./articles');

// Combine and chunk
const allDocs = [pdf, ...docs];
const chunks = chunkDocuments(allDocs, { 
  chunkSize: 500,
  overlap: 50 
});

// Setup RAG
const client = new OllamaRAGClient();
const embed = createOllamaRAGEmbedding(client, 'qwen3-embedding:0.6b');
const store = new InMemoryVectorStore(embed);
const retriever = new Retriever(store);

// Add to vector store
await store.addDocuments(chunks);

// Query
const results = await retriever.getRelevant('What is the main topic?', 3);
const answer = await generateWithRAG(client, 'granite4:3b', 
  'What is the main topic?', results);

console.log(answer);

See full example: example/advanced/document-loading-example.js


❓ Troubleshooting

| Problem | Solution | |---------|----------| | 🚫 CORS errors | Use a proxy server (Express/Next.js API routes) | | 🔌 Connection refused | Ensure Ollama is running: ollama serve | | 📦 Models not found | Pull models: ollama pull granite4:3b && ollama pull qwen3-embedding:0.6b | | 🌐 404 on /api/embed | Check your proxy configuration in vite.config.js or API routes | | 💻 Windows IPv6 issues | Use 127.0.0.1 instead of localhost | | 📦 Module not found | Check imports: use 'quick-rag' not 'quick-rag/...' |

Note: v0.6.5+ automatically detects and uses the correct API (generate or chat) for any model.


📚 Documentation

🔗 Resources


📄 License

MIT © Cihat Emre Karataş


🙏 Acknowledgments

Built with:

Special thanks to all contributors and the open-source community!


Made with ❤️ for the JavaScript & AI community