npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2025 – Pkg Stats / Ryan Hefner

@coworker-agency/rag

v1.1.1

Published

Retrieval Augmented Generation (RAG) library for document indexing, vector storage, and AI-powered question answering

Readme

@coworker-agency/rag

A Retrieval Augmented Generation (RAG) library for document indexing, vector storage, and AI-powered question answering. This package provides a complete solution for building RAG systems with Supabase and OpenAI.

Features

  • Document Indexing: Process and index documents from Supabase storage into vector databases
  • Context-Aware Vectors: Generate context-aware vector embeddings for improved retrieval quality
  • Memory-Efficient Processing: Handles large documents without running into memory limitations
  • Query Classification: Classify user queries into predefined categories
  • Question Answering: Generate accurate answers from your document collection
  • FAQ Generation: Automatically generate FAQs from your documents

Installation

npm install @coworker-agency/rag

Quick Start

import { init } from '@coworker-agency/rag';

// Initialize the library with your configuration
const rag = init({
  supabaseUrl: 'https://your-project.supabase.co',
  supabaseKey: 'your-supabase-key',
  supabaseSecretKey: 'your-supabase-service-role-key',
  supabaseBucket: 'documents',
  tableName: 'vector_documents',
  openaiApiKey: 'your-openai-api-key',
  openaiLlmModel: 'gpt-4o',
  openaiEmbeddingModel: 'text-embedding-3-small'
});

// Process documents from Supabase storage
const result = await rag.processDocuments();
console.log(`Processed ${result.processed} documents`);

// Search for relevant information
const faqs = await rag.searchFaq('What is context-aware retrieval?', 3);
console.log(faqs.questions);

// Get specific answer to a question
const answer = await rag.getAnswer('How do vector embeddings work?');
console.log(answer);

API Reference

init(options)

Initializes the RAG library with configuration options.

const rag = init({
  supabaseUrl,         // Supabase project URL (required)
  supabaseKey,         // Supabase public key (optional if supabaseSecretKey is provided)
  supabaseSecretKey,   // Supabase service role key (required)
  supabaseBucket,      // Supabase storage bucket name (required)
  tableName,           // Vector store table name (default: 'vector_documents')
  openaiApiKey,        // OpenAI API key (required)
  openaiLlmModel,      // OpenAI LLM model (default: 'gpt-4o')
  openaiEmbeddingModel // OpenAI Embedding model (default: 'text-embedding-3-small')
});

processDocuments(options)

Processes documents from Supabase storage and indexes them into the vector store.

const result = await rag.processDocuments({
  batchSize: 10,        // Number of files to process in parallel
  maxFileSize: 20971520, // 20MB max file size
  skipExisting: true     // Skip already indexed files
});

searchFaq(query, limit)

Search vector store and generate FAQ responses based on the query.

const faqs = await rag.searchFaq('How do I implement RAG?', 5);
// Returns { questions: [{ question, answer, links }] }

getAnswer(question)

Get a specific answer to a question using the vector store.

const result = await rag.getAnswer('What are embeddings?');
// Returns { question, answer: { text, links } }

queryClassifier(query, classifications, options)

Classify a query into one of the provided categories.

const category = await rag.queryClassifier(
  'How do I reset my password?',
  [
    { category: 'account', description: 'Account management questions', temperature: 0.8 },
    { category: 'technical', description: 'Technical or implementation questions', temperature: 0.7 },
    { category: 'billing', description: 'Billing or payment questions', temperature: 0.6 }
  ],
  { memory: ['Previous user message'] }
);

getContextAwareVectors(documentContent, options)

Generate context-aware vectors from document content.

const vectors = await rag.getContextAwareVectors(documentText, {
  chunkSize: 1000,
  chunkOverlap: 200
});
// Returns array of { context, content, vector }

Memory-Efficient Processing

This package includes special handling for large documents to avoid memory issues:

  • Automatic detection of large files (>100K chars)
  • Split processing into manageable chunks
  • Memory-efficient batch processing
  • Optional LLM refinement skipping for very large files
  • Document summarization and context extraction

Supabase Setup

This package requires a properly configured Supabase instance with:

  1. A vector-enabled table (e.g., vector_documents)
  2. Storage bucket for document files
  3. Optional RPC function for vector search

Example SQL for creating the vector table:

-- Enable pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;

-- Create table for document vectors
CREATE TABLE vector_documents (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  content TEXT NOT NULL,
  metadata JSONB,
  embedding VECTOR(1536),  -- For text-embedding-3-small (1536 dimensions)
  created_at TIMESTAMPTZ DEFAULT NOW()
);

-- Create function for similarity search
CREATE OR REPLACE FUNCTION match_documents(
  query_embedding VECTOR(1536),
  match_threshold FLOAT,
  match_count INT
) RETURNS TABLE (
  id UUID,
  content TEXT,
  metadata JSONB,
  similarity FLOAT
) LANGUAGE SQL STABLE AS $$
  SELECT
    id,
    content,
    metadata,
    1 - (embedding <=> query_embedding) AS similarity
  FROM vector_documents
  WHERE 1 - (embedding <=> query_embedding) > match_threshold
  ORDER BY similarity DESC
  LIMIT match_count;
$$;

License

MIT