npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@doclo/providers-extend

v0.1.0

Published

Extend.ai provider integration for document parsing, extraction, classification, and splitting

Readme

@doclo/providers-extend

Extend.ai provider implementations for the Doclo SDK. Provides document parsing, extraction, classification, and splitting capabilities using Extend.ai's document processing API.

Installation

pnpm add @doclo/providers-extend
# or
npm install @doclo/providers-extend

Providers

| Provider | Type | Node Compatibility | Description | |----------|------|-------------------|-------------| | extendParseProvider | OCRProvider | parse() | Document parsing to markdown/text | | extendExtractProvider | VLMProvider | extract() | Schema-based data extraction | | extendClassifyProvider | VLMProvider | categorize() | Document classification | | extendSplitProvider | VLMProvider | split() | Multi-page document splitting |

Environment Variables

EXTEND_API_KEY=your-extend-api-key

Usage

Parse Provider

Convert documents to structured markdown/text using Extend.ai's parser:

import { extendParseProvider } from '@doclo/providers-extend';

const provider = extendParseProvider({
  apiKey: process.env.EXTEND_API_KEY!,
});

// Parse a local file
const result = await provider.parseToIR({
  url: '/path/to/document.pdf',
});

// Parse from URL
const result = await provider.parseToIR({
  url: 'https://example.com/document.pdf',
});

// Parse from base64
const result = await provider.parseToIR({
  base64: 'data:application/pdf;base64,JVBERi0x...',
});

console.log(result.pages[0].markdown);

Extract Provider

Extract structured data from documents using a pre-configured processor:

import { extendExtractProvider } from '@doclo/providers-extend';

const provider = extendExtractProvider({
  apiKey: process.env.EXTEND_API_KEY!,
  processorId: 'your-processor-id', // Required: ID from Extend dashboard
});

const result = await provider.completeJson({
  prompt: {
    images: [{ url: '/path/to/invoice.jpg' }],
  },
  schema: {
    type: 'object',
    properties: {
      invoice_number: { type: 'string' },
      total_amount: { type: 'number' },
      invoice_date: { type: 'string', 'extend:type': 'date' }, // ISO 8601 format
    },
  },
});

console.log(result.json);
// { invoice_number: 'INV-001', total_amount: 1234.56, invoice_date: '2024-01-15' }

Classify Provider

Classify documents into categories:

import { extendClassifyProvider } from '@doclo/providers-extend';

const provider = extendClassifyProvider({
  apiKey: process.env.EXTEND_API_KEY!,
  processorId: 'your-classifier-id', // Required: Classifier processor ID
});

const result = await provider.completeJson({
  prompt: {
    pdfs: [{ url: '/path/to/document.pdf' }],
  },
  schema: {
    type: 'object',
    properties: {
      category: { enum: ['invoice', 'receipt', 'contract', 'other'] },
    },
  },
});

console.log(result.json);
// { category: 'invoice', classificationId: 'inv', confidence: 0.95 }

Split Provider

Split multi-page documents into sub-documents:

import { extendSplitProvider } from '@doclo/providers-extend';

const provider = extendSplitProvider({
  apiKey: process.env.EXTEND_API_KEY!,
  processorId: 'your-splitter-id', // Required: Splitter processor ID
  classifications: [
    { id: 'invoice', type: 'invoice', label: 'Invoice Document' },
    { id: 'contract', type: 'contract', label: 'Contract Document' },
  ],
});

const result = await provider.completeJson({
  prompt: {
    pdfs: [{ url: '/path/to/mixed-documents.pdf' }],
  },
  schema: { type: 'object' },
});

console.log(result.json);
// { documents: [{ type: 'invoice', pages: [1, 2], confidence: 0.9 }, ...], totalPages: 10 }

Factory Function

Use createExtendProvider for a unified interface:

import { createExtendProvider } from '@doclo/providers-extend';

// Create parse provider
const parseProvider = createExtendProvider({
  type: 'parse',
  apiKey: process.env.EXTEND_API_KEY!,
});

// Create extract provider
const extractProvider = createExtendProvider({
  type: 'extract',
  apiKey: process.env.EXTEND_API_KEY!,
  processorId: 'your-processor-id',
});

Extend-Specific Features

Schema Type Extensions

Extend.ai supports special type annotations in JSON Schema:

const schema = {
  type: 'object',
  properties: {
    // Date normalization - returns ISO 8601 format (YYYY-MM-DD)
    invoice_date: { type: 'string', 'extend:type': 'date' },

    // Currency normalization - returns { amount, currency }
    total: { type: 'object', 'extend:type': 'currency' },

    // Signature detection - returns { detected: boolean, confidence: number }
    signature: { type: 'object', 'extend:type': 'signature' },

    // Enum with descriptions for better accuracy
    category: {
      type: 'string',
      enum: ['invoice', 'receipt', 'contract'],
      'extend:descriptions': [
        'A bill for goods or services',
        'Proof of payment',
        'Legal agreement document',
      ],
    },
  },
};

Citations and Confidence

Enable citations to get bounding box coordinates for extracted values:

const provider = extendExtractProvider({
  apiKey: process.env.EXTEND_API_KEY!,
  processorId: 'your-processor-id',
  citationsEnabled: true, // Requires processor-level configuration
});

const result = await provider.completeJson({...});

// Access citations in metadata
if (result.metadata?.citations) {
  for (const citation of result.metadata.citations) {
    console.log(citation.fieldPath, citation.referenceText, citation.polygon);
  }
}

// Access confidence scores
if (result.metadata?.confidence) {
  console.log(result.metadata.confidence.invoice_number); // 0.95
}

Supported File Formats

| Format | Extension | Parse | Extract | Classify | Split | |--------|-----------|-------|---------|----------|-------| | PDF | .pdf | Yes | Yes | Yes | Yes | | PNG | .png | Yes | Yes | Yes | Yes | | JPEG | .jpg, .jpeg | Yes | Yes | Yes | Yes | | WebP | .webp | Yes | Yes | Yes | Yes | | TIFF | .tif, .tiff | Yes | Yes | Yes | Yes | | GIF | .gif | Yes | Yes | Yes | Yes |

Note: The providers automatically detect the actual file format from magic bytes, handling cases where file extensions don't match content (e.g., a WebP file saved as .jpg).

API Configuration

import { extendParseProvider } from '@doclo/providers-extend';

const provider = extendParseProvider({
  apiKey: process.env.EXTEND_API_KEY!,
  endpoint: 'https://api.extend.ai', // Optional: custom endpoint
  apiVersion: '2025-04-21', // Optional: API version (default: latest)
  timeout: 120000, // Optional: request timeout in ms (default: 120000)
});

Low-Level API Client

For advanced use cases, use the API client directly:

import { ExtendApiClient } from '@doclo/providers-extend';

const client = new ExtendApiClient({
  apiKey: process.env.EXTEND_API_KEY!,
});

// Upload a file
const { fileId } = await client.uploadFile(buffer, 'document.pdf', 'application/pdf');

// Parse a file
const parseResult = await client.parse({ fileId });

// Run a processor
const runResponse = await client.runProcessor('processor-id', { fileId });

// Poll for completion
const result = await client.pollProcessorRun(runResponse.runId);

Utility Functions

import {
  isMimeTypeSupported,
  getProviderMetadata,
  getProvidersForNode,
  canProviderHandleFile,
  detectMimeType,
} from '@doclo/providers-extend';

// Check if a MIME type is supported
isMimeTypeSupported('application/pdf'); // true
isMimeTypeSupported('text/plain'); // false

// Get providers compatible with a specific node type
const extractProviders = getProvidersForNode('extract');

// Check if a provider can handle a specific file type
canProviderHandleFile('extend-extract', 'image/jpeg'); // true

// Detect MIME type from file magic bytes
const buffer = fs.readFileSync('file.jpg');
const mimeType = detectMimeType(buffer, 'file.jpg'); // 'image/webp' (if actually WebP)

Pricing

Extend.ai uses a credit-based pricing model:

| Tier | Price per Credit | Included Credits | |------|-----------------|------------------| | Starter | $0.01 | Starting at 30,000 | | Scale | $0.008 | 100,000 | | Enterprise | Custom | Custom |

License

MIT