npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@verydia/loaders

v0.1.0

Published

Core document loader framework for Verydia ingestion pipelines

Readme

@verydia/loaders

Core document loader framework for Verydia ingestion pipelines.

Overview

@verydia/loaders provides the foundational types and abstractions for loading documents into Verydia from various sources. This package is:

  • Environment-agnostic - Works in Node.js, browsers, and edge runtimes
  • Zero dependencies - No external dependencies for maximum portability
  • Strongly typed - Full TypeScript support with comprehensive JSDoc
  • Extensible - Easy to create custom loaders for any data source

Installation

pnpm add @verydia/loaders

Quick Start

Unified loadDocuments() Facade

The easiest way to load documents from any source:

import { loadDocuments } from "@verydia/loaders";

// Load from filesystem
const docs = await loadDocuments({
  kind: "fs",
  options: { path: "./documents", recursive: true },
});

// Load a ZIP archive
const docs = await loadDocuments({
  kind: "zip",
  path: "./evidence-bundle.zip",
});

// Load from Notion
const docs = await loadDocuments({
  kind: "notion",
  options: { auth: process.env.NOTION_TOKEN!, databaseId: "abc123" },
});

// Load from Slack
const docs = await loadDocuments({
  kind: "slack",
  options: { token: process.env.SLACK_TOKEN!, channelIds: ["C123"] },
});

// Load from Google Drive
const docs = await loadDocuments({
  kind: "gdrive",
  options: { authClient, query: "mimeType='application/vnd.google-apps.document'" },
});

Simple PDF Loading (FsLoader)

For general-purpose PDF loading with minimal configuration:

import { FsLoader } from "@verydia/loaders";

// Load a single PDF
const loader = new FsLoader({ path: "./document.pdf" });
const docs = await loader.load();
// Returns 1 document with full text

// Load all PDFs in a directory
const loader = new FsLoader({
  path: "./pdfs",
  recursive: true,
  includeExtensions: [".pdf"],
});
const docs = await loader.load();

Advanced PDF Loading (Legal/High-Fidelity Use Cases)

For legal workflows requiring per-page documents or layout preservation:

import { PdfAdvancedLoader } from "@verydia/loaders";

// Per-page mode (one document per page)
const loader = new PdfAdvancedLoader({
  path: "./cases/brief.pdf",
  mode: "perPage",
});
const docs = await loader.load();
// docs[0].metadata.page === 1
// docs[1].metadata.page === 2
// ...

// Docling layout-preserving mode
const loader = new PdfAdvancedLoader({
  path: "./cases/contract.pdf",
  mode: "docling",
  docling: {
    endpoint: process.env.DOCLING_ENDPOINT!,
    apiKeyHeaderName: "x-api-key",
    apiKey: process.env.DOCLING_API_KEY,
  },
});
const docs = await loader.load();
// docs[i].metadata.layoutMarkup contains HTML/Markdown
// Preserves headings, tables, footnotes, etc.

Advanced DOCX Loading (Contracts/Legal Documents)

For contract and legal document workflows requiring section-based analysis:

import { DocxAdvancedLoader } from "@verydia/loaders";

// Single mode (same as basic loader but explicit)
const loader = new DocxAdvancedLoader({
  path: "./contracts/service-agreement.docx",
});
const docs = await loader.load();

// Per-section mode by headings (H1/H2)
const loader = new DocxAdvancedLoader({
  path: "./briefs/motion-to-dismiss.docx",
  mode: "perSection",
  headingLevels: [1, 2],
});
const sections = await loader.load();
// sections[i].metadata.sectionHeading, sectionHtml, etc.

// HTML layout mode
const loader = new DocxAdvancedLoader({
  path: "./policies/employee-handbook.docx",
  mode: "htmlLayout",
});
const [doc] = await loader.load();
// doc.metadata.layoutHtml contains HTML for downstream rendering

When to use which loader:

| Use Case | Loader | Mode | |----------|--------|------| | General PDF/DOCX ingestion | FsLoader | N/A | | Mixed file types (TXT, JSON, CSV, HTML, PDF, DOCX) | FsLoader | N/A | | Legal citations (page-specific) | PdfAdvancedLoader | perPage | | PDF layout preservation (tables, headings) | PdfAdvancedLoader | docling | | Contract section analysis | DocxAdvancedLoader | perSection | | DOCX with HTML formatting | DocxAdvancedLoader | htmlLayout | | Simple single-document PDF/DOCX | PdfAdvancedLoader / DocxAdvancedLoader | single |

Core Concepts

VerydiaDocument

The canonical document model used throughout Verydia for:

  • Data ingestion (loaders)
  • Document splitting and chunking
  • RAG (retrieval-augmented generation)
  • Telemetry and observability
interface VerydiaDocument {
  id: string;                    // Unique identifier
  text: string;                  // Document content
  metadata: {
    source: string;              // Source system (e.g., 'fs', 'notion', 'slack')
    uri?: string;                // Path, URL, or external ID
    title?: string;              // Human-readable title
    mimeType?: string;           // MIME type
    page?: number;               // Page number (1-indexed)
    createdAt?: string;          // ISO 8601 timestamp
    updatedAt?: string;          // ISO 8601 timestamp
    [key: string]: unknown;      // Domain-specific fields
  };
}

VerydiaLoader

Interface that all loaders must implement:

interface VerydiaLoader {
  load(): Promise<VerydiaDocument[]>;
}

BaseLoader

Abstract base class providing:

  • Required load() method for subclasses to implement
  • Built-in loadAndSplit() method for loading + splitting in one call

Usage

Creating a Custom Loader

import { BaseLoader, type VerydiaDocument } from "@verydia/loaders";

class MyCustomLoader extends BaseLoader {
  constructor(private apiKey: string) {
    super();
  }

  async load(): Promise<VerydiaDocument[]> {
    // Fetch documents from your source
    const response = await fetch("https://api.example.com/documents", {
      headers: { Authorization: `Bearer ${this.apiKey}` },
    });

    const data = await response.json();

    return data.documents.map((doc: any, idx: number) => ({
      id: doc.id || `doc-${idx}`,
      text: doc.content,
      metadata: {
        source: "my-api",
        uri: `https://api.example.com/documents/${doc.id}`,
        title: doc.title,
        createdAt: doc.created_at,
        updatedAt: doc.updated_at,
        // Custom metadata
        author: doc.author,
        tags: doc.tags,
      },
    }));
  }
}

// Usage
const loader = new MyCustomLoader("your-api-key");
const docs = await loader.load();
console.log(`Loaded ${docs.length} documents`);

Loading and Splitting

The loadAndSplit() method allows you to load and chunk documents in one call:

import { BaseLoader, type VerydiaDocument, type DocumentSplitter } from "@verydia/loaders";

// Custom splitter that chunks by character count
const chunkByChars: DocumentSplitter = async (docs) => {
  const chunkSize = 500;
  const chunks: VerydiaDocument[] = [];

  for (const doc of docs) {
    for (let i = 0; i < doc.text.length; i += chunkSize) {
      chunks.push({
        id: `${doc.id}-chunk-${Math.floor(i / chunkSize)}`,
        text: doc.text.slice(i, i + chunkSize),
        metadata: {
          ...doc.metadata,
          page: Math.floor(i / chunkSize) + 1,
          isChunk: true,
          originalDocId: doc.id,
        },
      });
    }
  }

  return chunks;
};

// Load and split
const loader = new MyCustomLoader("your-api-key");
const chunks = await loader.loadAndSplit(chunkByChars);
console.log(`Created ${chunks.length} chunks`);

Legal/Enterprise Metadata

The VerydiaDocument metadata supports domain-specific fields for legal and enterprise use cases:

const legalDoc: VerydiaDocument = {
  id: "contract-123",
  text: "This Agreement is entered into...",
  metadata: {
    source: "sharepoint",
    uri: "https://company.sharepoint.com/contracts/123",
    title: "Service Agreement - Acme Corp",
    mimeType: "application/pdf",
    createdAt: "2024-01-15T10:30:00Z",
    updatedAt: "2024-01-20T14:45:00Z",
    
    // Legal-specific metadata
    jurisdiction: "CA",
    documentType: "contract",
    author: "Legal Department",
    department: "Legal",
    confidentiality: "internal",
    tags: ["contracts", "services", "2024"],
    expirationDate: "2025-01-15",
    parties: ["Acme Corp", "Our Company"],
  },
};

Environment-Agnostic Design

This package contains zero environment-specific APIs. It works in:

  • ✅ Node.js
  • ✅ Browsers
  • ✅ Edge runtimes (Cloudflare Workers, Vercel Edge, etc.)
  • ✅ React Native
  • ✅ Electron

Specific loaders (filesystem, Notion, Slack, etc.) will be in separate packages:

  • @verydia/loaders-fs - Filesystem loader (Node.js only)
  • @verydia/loaders-web - Web scraping loader
  • @verydia/loaders-notion - Notion API loader
  • @verydia/loaders-slack - Slack API loader
  • And more...

API Reference

Types

VerydiaDocument

The canonical document model for all Verydia ingestion.

Fields:

  • id: string - Unique identifier (should be stable across re-ingestion)
  • text: string - Document content
  • metadata: object - Metadata about source, structure, and domain

VerydiaLoader

Interface for all document loaders.

Methods:

  • load(): Promise<VerydiaDocument[]> - Load documents from source

DocumentSplitter

Type for document splitting functions.

type DocumentSplitter = (docs: VerydiaDocument[]) => Promise<VerydiaDocument[]>;

Classes

BaseLoader

Abstract base class for all loaders.

Methods:

  • abstract load(): Promise<VerydiaDocument[]> - Implement to load documents
  • loadAndSplit(splitter?: DocumentSplitter): Promise<VerydiaDocument[]> - Load and optionally split

Testing

pnpm test

Building

pnpm build

Outputs:

  • ESM: dist/index.js
  • CJS: dist/index.cjs
  • TypeScript declarations: dist/index.d.ts

License

MIT