npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@cogitator-ai/rag

v0.1.4

Published

Retrieval Augmented Generation for Cogitator AI agents

Readme

@cogitator-ai/rag

Retrieval-Augmented Generation pipeline for Cogitator AI agents. Load documents, chunk them, embed, retrieve, and rerank — all with a single builder API.

Installation

pnpm add @cogitator-ai/rag

# Optional dependencies for specific loaders
pnpm add cheerio    # HTML and web page loading
pnpm add papaparse  # CSV loading
pnpm add pdf-parse  # PDF loading

Features

  • 7 Document Loaders — Text, Markdown, JSON, CSV, HTML, PDF, Web pages
  • 3 Chunking Strategies — Fixed-size, recursive, semantic (embedding-based)
  • 4 Retrieval Strategies — Similarity, MMR, hybrid (BM25 + vector), multi-query
  • 2 Rerankers — LLM-based scoring, Cohere Rerank API
  • Pipeline Builder — Fluent API to wire everything together
  • Agent Tools — Drop-in rag_search and rag_ingest tools for Cogitator agents
  • Zod Validation — Type-safe configuration with runtime checks

Quick Start

import { RAGPipelineBuilder, TextLoader } from '@cogitator-ai/rag';
import { InMemoryEmbeddingAdapter, OpenAIEmbeddingService } from '@cogitator-ai/memory';

const pipeline = new RAGPipelineBuilder()
  .withLoader(new TextLoader())
  .withEmbeddingService(
    new OpenAIEmbeddingService({
      apiKey: process.env.OPENAI_API_KEY!,
    })
  )
  .withEmbeddingAdapter(new InMemoryEmbeddingAdapter())
  .withConfig({
    chunking: { strategy: 'recursive', chunkSize: 500, chunkOverlap: 50 },
    retrieval: { strategy: 'similarity', topK: 5, threshold: 0.3 },
  })
  .build();

// ingest documents from a file or directory
await pipeline.ingest('./docs');

// query the knowledge base
const results = await pipeline.query('How does authentication work?');

for (const r of results) {
  console.log(`[${r.score.toFixed(3)}] ${r.content.slice(0, 100)}...`);
}

Document Loaders

| Loader | Formats | Optional Dep | Notes | | ---------------- | --------------------- | ------------ | ----------------------------- | | TextLoader | .txt | — | Files and directories | | MarkdownLoader | .md | — | Strips frontmatter by default | | JSONLoader | .json | — | Configurable content field | | CSVLoader | .csv | papaparse | Column selection, row mapping | | HTMLLoader | .html, .htm | cheerio | CSS selector support | | PDFLoader | .pdf | pdf-parse | Text extraction from PDFs | | WebLoader | http://, https:// | cheerio | Fetches and parses web pages |

import { MarkdownLoader, WebLoader, CSVLoader } from '@cogitator-ai/rag';

const md = new MarkdownLoader({ stripFrontmatter: true });
const web = new WebLoader({ selector: 'article' });
const csv = new CSVLoader({ contentColumn: 'body', metadataColumns: ['title'] });

Chunking Strategies

| Strategy | Class | Best For | | ----------- | ------------------ | -------------------------------------- | | fixed | FixedSizeChunker | Simple, predictable chunk sizes | | recursive | RecursiveChunker | Respects paragraph/sentence boundaries | | semantic | SemanticChunker | Groups semantically similar sentences |

Fixed-size

Splits text into chunks of exactly chunkSize characters with optional overlap.

import { FixedSizeChunker } from '@cogitator-ai/rag';

const chunker = new FixedSizeChunker({ chunkSize: 500, chunkOverlap: 50 });
const chunks = chunker.chunk(text, documentId);

Recursive

Splits on configurable separators (\n\n, \n, . , ) trying to keep paragraphs and sentences intact.

import { RecursiveChunker } from '@cogitator-ai/rag';

const chunker = new RecursiveChunker({
  chunkSize: 500,
  chunkOverlap: 50,
  separators: ['\n\n', '\n', '. ', ' '],
});

Semantic

Uses embedding similarity between sentences to find natural breakpoints. Async — requires an EmbeddingService.

import { SemanticChunker } from '@cogitator-ai/rag';

const chunker = new SemanticChunker({
  embeddingService,
  breakpointThreshold: 0.5,
  minChunkSize: 100,
  maxChunkSize: 2000,
});

const chunks = await chunker.chunk(text, documentId);

Factory

import { createChunker } from '@cogitator-ai/rag';

const chunker = createChunker(
  { strategy: 'recursive', chunkSize: 500, chunkOverlap: 50 },
  embeddingService
);

Retrieval Strategies

| Strategy | Class | Description | | ------------- | --------------------- | ------------------------------------------------------------- | | similarity | SimilarityRetriever | Pure cosine similarity search | | mmr | MMRRetriever | Maximal Marginal Relevance — balances relevance and diversity | | hybrid | HybridRetriever | Combines BM25 keyword search with vector search (RRF) | | multi-query | MultiQueryRetriever | Expands query into variants, merges results |

Similarity

import { SimilarityRetriever } from '@cogitator-ai/rag';

const retriever = new SimilarityRetriever({
  embeddingAdapter,
  embeddingService,
  defaultTopK: 10,
  defaultThreshold: 0.3,
});

const results = await retriever.retrieve('What is TypeScript?');

MMR

Reduces redundancy by penalizing results that are too similar to already-selected ones.

import { MMRRetriever } from '@cogitator-ai/rag';

const retriever = new MMRRetriever({
  embeddingAdapter,
  embeddingService,
  defaultLambda: 0.7, // 1.0 = pure relevance, 0.0 = pure diversity
  defaultTopK: 10,
});

Hybrid

Requires HybridSearch from @cogitator-ai/memory.

import { HybridRetriever } from '@cogitator-ai/rag';
import { HybridSearch } from '@cogitator-ai/memory';

const retriever = new HybridRetriever({
  hybridSearch,
  defaultWeights: { bm25: 0.4, vector: 0.6 },
});

Multi-Query

Generates query variations and merges results. You provide the expansion function (typically an LLM call).

import { MultiQueryRetriever } from '@cogitator-ai/rag';

const retriever = new MultiQueryRetriever({
  baseRetriever: similarityRetriever,
  expandQuery: async (query) => {
    const response = await llm.generate(
      `Generate 3 alternative phrasings for: "${query}". Return one per line.`
    );
    return response.split('\n').filter(Boolean);
  },
});

Factory

import { createRetriever } from '@cogitator-ai/rag';

const retriever = createRetriever({
  strategy: 'mmr',
  embeddingAdapter,
  embeddingService,
  lambda: 0.7,
  topK: 10,
});

Reranking

Rerankers rescore retrieval results for higher precision. Enable via pipeline config.

LLM Reranker

Uses any LLM to score document relevance on a 0-10 scale.

import { LLMReranker } from '@cogitator-ai/rag';

const reranker = new LLMReranker({
  generateFn: (prompt) => llm.generate(prompt),
});

const pipeline = new RAGPipelineBuilder()
  .withLoader(loader)
  .withEmbeddingService(embeddingService)
  .withEmbeddingAdapter(embeddingAdapter)
  .withReranker(reranker)
  .withConfig({
    chunking: { strategy: 'recursive', chunkSize: 500, chunkOverlap: 50 },
    retrieval: { strategy: 'similarity', topK: 20 },
    reranking: { enabled: true, topN: 5 },
  })
  .build();

Cohere Reranker

Uses the Cohere Rerank API (rerank-v3.5 by default).

import { CohereReranker } from '@cogitator-ai/rag';

const reranker = new CohereReranker({
  apiKey: process.env.COHERE_API_KEY!,
  model: 'rerank-v3.5',
});

Agent Integration

Use ragTools() to give a Cogitator agent access to your knowledge base.

import { Agent, tool } from '@cogitator-ai/core';
import { RAGPipelineBuilder, TextLoader, createSearchTool } from '@cogitator-ai/rag';
import { InMemoryEmbeddingAdapter, OpenAIEmbeddingService } from '@cogitator-ai/memory';
import { z } from 'zod';

const pipeline = new RAGPipelineBuilder()
  .withLoader(new TextLoader())
  .withEmbeddingService(new OpenAIEmbeddingService({ apiKey: process.env.OPENAI_API_KEY! }))
  .withEmbeddingAdapter(new InMemoryEmbeddingAdapter())
  .withConfig({
    chunking: { strategy: 'recursive', chunkSize: 400, chunkOverlap: 50 },
    retrieval: { strategy: 'similarity', topK: 3, threshold: 0.3 },
  })
  .build();

await pipeline.ingest('./knowledge-base');

const ragSearch = createSearchTool(pipeline);

const searchKB = tool({
  name: ragSearch.name,
  description: ragSearch.description,
  parameters: z.object({
    query: z.string().describe('Search query'),
    limit: z.number().int().positive().optional(),
    threshold: z.number().min(0).max(1).optional(),
  }),
  execute: async (params) => ragSearch.execute(params),
});

const agent = new Agent({
  name: 'docs-assistant',
  model: 'gpt-4o',
  instructions: 'Use rag_search to find information before answering.',
  tools: [searchKB],
});

Pipeline Stats

const stats = pipeline.getStats();
console.log(stats.documentsIngested);
console.log(stats.chunksStored);
console.log(stats.queriesProcessed);

Examples

See examples/rag/ for runnable examples:

  • 01-basic-retrieval.ts — Ingest documents and run semantic queries
  • 02-chunking-strategies.ts — Compare fixed, recursive, and semantic chunking
  • 03-agent-with-rag.ts — Full agent with RAG search tools

Zod Schemas

import {
  ChunkingStrategySchema,
  ChunkingConfigSchema,
  RetrievalStrategySchema,
  RetrievalConfigSchema,
  RerankingConfigSchema,
  RAGPipelineConfigSchema,
} from '@cogitator-ai/rag';

License

MIT