@voltx/rag

v0.3.1

Published

2 days ago

VoltX RAG pipeline primitives — document loading, chunking, embedding, retrieval

0High
0Medium
0Low

voltx rag retrieval-augmented-generation embeddings chunking pipeline document-loaders text-splitters vector-search

Production-ready Retrieval-Augmented Generation pipeline for the VoltX framework. Load documents, split into chunks, generate embeddings, store in a vector database, and retrieve relevant context for LLM prompts.

Installation

npm install @voltx/rag

Quick Start

import { createRAGPipeline, createEmbedder } from "@voltx/rag";
import { createVectorStore } from "@voltx/db";

const pipeline = createRAGPipeline({
  embedder: createEmbedder({ model: "openai:text-embedding-3-small" }),
  vectorStore: createVectorStore(),
});

// Ingest documents
await pipeline.ingest("Your long document text here...");

// Query with natural language
const { sources } = await pipeline.query("What is TypeScript?");

// Or get formatted context for an LLM prompt
const context = await pipeline.getContext("What is TypeScript?");

Features

Document Loaders

| Loader | Description | |--------|-------------| | TextLoader | Plain text files or raw strings | | MarkdownLoader | Markdown files (strips front-matter) | | JSONLoader | JSON files (extracts text from configurable keys) | | WebLoader | Fetches and extracts text from URLs |

Text Splitters

| Splitter | Description | |----------|-------------| | RecursiveTextSplitter | Smart splitting with separator hierarchy (recommended) | | MarkdownSplitter | Heading-aware splitting, preserves header hierarchy | | CharacterSplitter | Simple character-based splitting with overlap |

Fluent Document API

Inspired by Mastra's MDocument pattern:

import { MDocument, createEmbedder } from "@voltx/rag";

const doc = MDocument.fromMarkdown("# Title\n\nContent here...");
const chunks = doc.chunk({ strategy: "markdown", chunkSize: 500 });
const embedded = await doc.embed(createEmbedder({ model: "openai:text-embedding-3-small" }));

Embedder

Wraps @voltx/ai embedding functions into a simple interface:

import { createEmbedder } from "@voltx/rag";

const embedder = createEmbedder({ model: "openai:text-embedding-3-small" });
const vector = await embedder.embed("Hello world");
const vectors = await embedder.embedBatch(["Hello", "World"]);

Pipeline Options

const pipeline = createRAGPipeline({
  loader: new WebLoader(),                    // optional document loader
  splitter: new RecursiveTextSplitter({       // text splitter (default: recursive)
    chunkSize: 1000,
    overlap: 200,
  }),
  embedder: createEmbedder({ model: "openai:text-embedding-3-small" }),
  vectorStore: createVectorStore("pinecone", { indexName: "docs" }),
});

// Query with options
const results = await pipeline.query("question", { topK: 5, minScore: 0.7 });

Part of VoltX

This package is part of the VoltX framework. See the monorepo for full documentation.

License

MIT — Made by the Promptly AI Team

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme