npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2025 – Pkg Stats / Ryan Hefner

hashub-docapp-js

v1.0.0

Published

JavaScript/TypeScript SDK for Hashub Document Processing API

Readme

HashubDocApp JavaScript/Node.js SDK

npm version npm downloads License TypeScript Status

Professional JavaScript/Node.js SDK for the HashubDocApp API - Advanced OCR, document conversion, and text extraction service.

✨ Features

  • 🚀 Fast OCR: Quick text extraction with 76+ language support
  • 🧠 Smart OCR: High-quality OCR with layout preservation
  • 📄 Document Conversion: Office documents (Word, Excel) and HTML to Markdown/Text
  • 🔄 Batch Processing: Process multiple files with intelligent categorization
  • 🌍 Multi-language: Support for 76+ languages with ISO 639-1 codes
  • 🎨 Image Enhancement: 11 pre-configured enhancement presets
  • 📊 Progress Tracking: Real-time progress monitoring with callbacks
  • Rate Limiting: Built-in API throttling protection
  • 🔷 TypeScript: Fully typed with comprehensive TypeScript definitions
  • 🌐 Universal: Works in Node.js and modern browsers

🚀 Quick Start

Installation

Install the latest stable version from npm:

npm install hashub-docapp-js

Or with yarn:

yarn add hashub-docapp-js

Or install the development version from GitHub:

npm install git+https://github.com/hasanbahadir/hashub-doc-js.git

Basic Usage

import { DocAppClient } from 'hashub-docapp-js';

// Initialize client
const client = new DocAppClient('your_api_key_here');

// Fast OCR - Quick text extraction
const text = await client.convertFast('document.pdf', { language: 'en' });
console.log(text);

// Smart OCR - High-quality with layout preservation  
const markdown = await client.convertSmart('document.pdf');
console.log(markdown);

TypeScript Usage

import { DocAppClient, ConvertFastOptions, ConvertSmartOptions } from 'hashub-docapp-js';

const client = new DocAppClient(process.env.HASHUB_API_KEY!);

// Fast OCR with language support
const fastOptions: ConvertFastOptions = {
  language: 'tr',
  enhancement: 'scan_low_dpi',
  output: 'markdown'
};

const result = await client.convertFast('document.pdf', fastOptions);
console.log(result);

📖 Core Methods

convertFast()

Fast OCR for quick text extraction with language support.

async convertFast(
  filePath: string,
  options?: ConvertFastOptions
): Promise<string>

interface ConvertFastOptions {
  output?: 'markdown' | 'txt' | 'json';
  language?: string;
  enhancement?: string;
  returnType?: 'content' | 'url' | 'file';
  saveTo?: string;
  showProgress?: boolean;
  timeout?: number;
}

Parameters:

  • filePath: Path to PDF or image file
  • options.output: Output format ("markdown", "txt", "json")
  • options.language: Language code (ISO 639-1 like "en", "tr", "de")
  • options.enhancement: Image enhancement preset (optional)
  • options.returnType: "content" (default), "url", or "file"
  • options.saveTo: File path when returnType="file"
  • options.showProgress: Show progress updates (default: true)
  • options.timeout: Maximum wait time in seconds (default: 300)

Examples:

// Basic fast OCR
const text = await client.convertFast('scan.pdf');

// With Turkish language
const text = await client.convertFast('document.pdf', { language: 'tr' });

// With enhancement for low-quality scans
const text = await client.convertFast('scan.pdf', { 
  enhancement: 'scan_low_dpi' 
});

// Save to file
await client.convertFast('document.pdf', { 
  returnType: 'file', 
  saveTo: 'output.txt' 
});

convertSmart()

High-quality OCR with layout preservation and structure detection.

async convertSmart(
  filePath: string,
  options?: ConvertSmartOptions
): Promise<string>

interface ConvertSmartOptions {
  output?: 'markdown' | 'txt' | 'json';
  returnType?: 'content' | 'url' | 'file';
  saveTo?: string;
  showProgress?: boolean;
  timeout?: number;
}

Examples:

// Smart OCR with layout preservation
const markdown = await client.convertSmart('complex_document.pdf');

// Save as file
await client.convertSmart('document.pdf', { 
  returnType: 'file', 
  saveTo: 'output.md' 
});

// Different output format
const jsonData = await client.convertSmart('document.pdf', { 
  output: 'json' 
});

🌍 Language Support

The SDK supports 76+ languages with ISO 639-1 codes:

import { LanguageHelper } from 'hashub-docapp-js';

// List all supported languages
const languages = LanguageHelper.listLanguages();
console.log(`Supported languages: ${languages.length}`);

// Get language info
const turkishInfo = LanguageHelper.getLanguageInfo('tr');
console.log(turkishInfo); // { english: 'Turkish', native: 'Türkçe', iso: 'tr', apiCode: 'lang_tur_tr' }

// Use with convertFast
const text = await client.convertFast('document.pdf', { language: 'tr' }); // Turkish
const text2 = await client.convertFast('document.pdf', { language: 'de' }); // German
const text3 = await client.convertFast('document.pdf', { language: 'zh' }); // Chinese

Popular Language Codes:

  • en - English
  • tr - Turkish
  • de - German
  • fr - French
  • es - Spanish
  • zh - Chinese (Simplified)
  • ar - Arabic
  • ru - Russian
  • ja - Japanese
  • ko - Korean

🎨 Image Enhancement Presets

The SDK includes 11 pre-configured enhancement presets for different document types:

// Enhancement presets (use with convertFast)
await client.convertFast('scan.pdf', { enhancement: 'document_crisp' });     // Clean documents
await client.convertFast('scan.pdf', { enhancement: 'scan_low_dpi' });       // Low quality scans
await client.convertFast('scan.pdf', { enhancement: 'camera_shadow' });      // Phone photos
await client.convertFast('scan.pdf', { enhancement: 'photocopy_faded' });    // Faded copies
await client.convertFast('scan.pdf', { enhancement: 'inverted_scan' });      // Inverted colors
await client.convertFast('scan.pdf', { enhancement: 'noisy_dots' });         // Noisy artifacts
await client.convertFast('scan.pdf', { enhancement: 'tables_fine' });        // Tables and grids
await client.convertFast('scan.pdf', { enhancement: 'receipt_thermal' });    // Receipts
await client.convertFast('scan.pdf', { enhancement: 'newspaper_moire' });    // Newspapers
await client.convertFast('scan.pdf', { enhancement: 'fax_low_quality' });    // Fax documents
await client.convertFast('scan.pdf', { enhancement: 'blueprint' });          // Technical drawings

📄 Document Conversion

convertDoc()

Convert Word, Excel, and other office documents.

async convertDoc(
  filePath: string,
  options?: ConvertDocOptions
): Promise<string>

interface ConvertDocOptions {
  output?: 'markdown' | 'txt' | 'json';
  returnType?: 'content' | 'url' | 'file';
  saveTo?: string;
  options?: Record<string, any>;
}

Examples:

// Convert Word document to Markdown
const markdown = await client.convertDoc('document.docx');

// Convert Excel to text
const text = await client.convertDoc('spreadsheet.xlsx', { output: 'txt' });

// Save to file
await client.convertDoc('presentation.pptx', { 
  returnType: 'file', 
  saveTo: 'output.md' 
});

convertHtmlString()

Convert HTML string content to other formats.

async convertHtmlString(
  htmlContent: string,
  options?: ConvertHtmlOptions
): Promise<string>

interface ConvertHtmlOptions {
  output?: 'markdown' | 'txt' | 'json';
  returnType?: 'content' | 'url' | 'file';
  saveTo?: string;
  options?: Record<string, any>;
}

Examples:

const html = '<h1>Title</h1><p>Content</p>';
const markdown = await client.convertHtmlString(html);

🔄 Batch Processing

batchConvertSmart()

Smart batch processing with automatic file categorization.

async batchConvertSmart(
  directory: string,
  saveTo: string,
  options?: BatchConvertSmartOptions
): Promise<BatchResult>

interface BatchConvertSmartOptions {
  outputFormat?: 'txt' | 'markdown' | 'json';
  recursive?: boolean;
  showProgress?: boolean;
  maxWorkers?: number;
  timeout?: number;
}

interface BatchResult {
  processedCount: number;
  successCount: number;
  failedCount: number;
  results: Array<{
    sourceFile: string;
    outputFile?: string;
    status: 'success' | 'failed';
    error?: string;
  }>;
}

Example:

// Process all files in directory intelligently
const results = await client.batchConvertSmart(
  './documents',
  './output',
  { outputFormat: 'markdown' }
);

console.log(`Processed ${results.processedCount} files`);
console.log(`Success: ${results.successCount}, Failed: ${results.failedCount}`);

batchConvertFast()

Fast batch OCR for images and PDFs.

async batchConvertFast(
  directory: string,
  saveTo: string,
  options?: BatchConvertFastOptions
): Promise<BatchResult>

interface BatchConvertFastOptions {
  language?: string;
  enhancement?: string;
  outputFormat?: 'txt' | 'markdown' | 'json';
  recursive?: boolean;
  showProgress?: boolean;
  maxWorkers?: number;
  timeout?: number;
}

batchConvertAuto()

Automatic processing mode selection based on file types.

async batchConvertAuto(
  directory: string,
  saveTo: string,
  options?: BatchConvertAutoOptions
): Promise<BatchResult>

interface BatchConvertAutoOptions {
  language?: string;
  enhancement?: string;
  outputFormat?: 'txt' | 'markdown' | 'json';
  recursive?: boolean;
  showProgress?: boolean;
  maxWorkers?: number;
  timeout?: number;
}

📊 Return Types

The SDK supports three return types for conversion methods:

1. Content (Default)

const text = await client.convertFast('doc.pdf', { returnType: 'content' });
console.log(text); // Direct text content

2. URL

const url = await client.convertFast('doc.pdf', { returnType: 'url' });
console.log(url); // Download URL for the result

3. File

const path = await client.convertFast('doc.pdf', { 
  returnType: 'file', 
  saveTo: 'output.txt' 
});
console.log(path); // Path to saved file

🛠️ Job Management

getStatus()

Check job status.

const status = await client.getStatus(jobId);
console.log(`Status: ${status.status}`);
console.log(`Progress: ${status.progress || 0}%`);

wait()

Wait for job completion with polling.

const finalStatus = await client.wait(jobId, { interval: 2000, timeout: 300000 });

getResult()

Get completed job result.

const result = await client.getResult(jobId);
console.log(result.content); // The extracted/converted text

cancel()

Cancel a running job.

await client.cancel(jobId);

🔧 Configuration

Environment Variables

export HASHUB_API_KEY="your_api_key_here"

Client Configuration

const client = new DocAppClient('your_api_key', {
  baseUrl: 'https://doc.hashub.dev/api/v1', // Default
  timeout: 30000,                           // Request timeout (ms)
  maxRetries: 3,                           // Max retry attempts
  rateLimitDelay: 2000                     // Min delay between requests (ms)
});

🎯 Usage Examples

Basic OCR

import { DocAppClient } from 'hashub-docapp-js';

const client = new DocAppClient('your_api_key');

// Extract text from PDF
const text = await client.convertFast('invoice.pdf', { language: 'en' });
console.log(text);

// High-quality OCR with layout
const markdown = await client.convertSmart('complex_document.pdf');
console.log(markdown);

Multi-language Processing

// Process documents in different languages
const documents = [
  { path: 'english_doc.pdf', lang: 'en' },
  { path: 'turkish_doc.pdf', lang: 'tr' },
  { path: 'german_doc.pdf', lang: 'de' },
  { path: 'chinese_doc.pdf', lang: 'zh' }
];

for (const doc of documents) {
  const text = await client.convertFast(doc.path, { language: doc.lang });
  console.log(`${doc.lang}: ${text.substring(0, 100)}...`);
}

Enhanced Image Processing

// Process different types of scanned documents
const scanTypes = {
  'old_book.pdf': 'scan_low_dpi',
  'phone_photo.jpg': 'camera_shadow',
  'faded_copy.pdf': 'photocopy_faded',
  'receipt.jpg': 'receipt_thermal',
  'technical_drawing.pdf': 'blueprint'
};

for (const [filePath, enhancement] of Object.entries(scanTypes)) {
  const text = await client.convertFast(filePath, {
    enhancement,
    language: 'en'
  });
  console.log(`Processed ${filePath} with ${enhancement}`);
}

Batch Processing Example

// Process entire directory
const results = await client.batchConvertAuto(
  './input_docs',
  './output',
  {
    outputFormat: 'markdown',
    showProgress: true
  }
);

console.log(`✅ Processed ${results.successCount} files successfully`);
results.results.forEach(result => {
  if (result.status === 'success') {
    console.log(`  📄 ${result.sourceFile} -> ${result.outputFile}`);
  }
});

🛡️ Error Handling

import { 
  DocAppClient,
  AuthenticationError,
  RateLimitError,
  ProcessingError,
  ValidationError
} from 'hashub-docapp-js';

const client = new DocAppClient('your_api_key');

try {
  const result = await client.convertFast('document.pdf');
  console.log(result);
} catch (error) {
  if (error instanceof AuthenticationError) {
    console.log('❌ Invalid API key');
  } else if (error instanceof RateLimitError) {
    console.log('⏳ Rate limit exceeded, wait and retry');
  } else if (error instanceof ProcessingError) {
    console.log(`💥 Processing failed: ${error.message}`);
  } else if (error instanceof ValidationError) {
    console.log(`📝 Validation error: ${error.message}`);
  } else {
    console.log(`📁 File not found or other error: ${error.message}`);
  }
}

🔄 Rate Limiting

The SDK includes built-in rate limiting to prevent API throttling:

  • Default delay: 2 seconds between requests
  • Automatic retry: Failed requests are retried with exponential backoff
  • Progress tracking: Polls job status with appropriate intervals
// Configure rate limiting
const client = new DocAppClient('your_key', {
  rateLimitDelay: 3000, // 3 second delay between requests
  maxRetries: 5         // Retry failed requests up to 5 times
});

📈 Performance Tips

  1. Use appropriate modes:

    • convertFast() for simple text extraction with language support
    • convertSmart() for complex layouts and formatting
  2. Batch processing:

    • Use batch methods for multiple files
    • Adjust maxWorkers based on your API limits
  3. Language specification:

    • Always specify the correct language for better accuracy
    • Use ISO codes for convenience ("en", "tr", "de")
  4. Enhancement presets:

    • Choose the right preset for your document type
    • Experiment with different presets for optimal results

🐛 Troubleshooting

Common Issues

1. Network Errors

// Ensure correct base URL
const client = new DocAppClient('your_key', {
  baseUrl: 'https://doc.hashub.dev/api/v1'
});

2. Rate Limiting

// Increase delay between requests
const client = new DocAppClient('your_key', {
  rateLimitDelay: 3000
});

3. Timeout Issues

// Increase timeout for large files
await client.convertSmart('large_file.pdf', { timeout: 600 });

4. Language Errors

// Check supported languages
import { LanguageHelper } from 'hashub-docapp-js';
const languages = LanguageHelper.listLanguages();
console.log(languages.map(lang => lang.iso));

📊 API Method Summary

| Method | Purpose | Key Parameters | Returns | |--------|---------|----------------|---------| | convertFast() | Fast OCR | filePath, language, enhancement | Promise<string> | | convertSmart() | Smart OCR | filePath, output | Promise<string> | | convertDoc() | Office docs | filePath, output | Promise<string> | | convertHtmlString() | HTML conversion | htmlContent, output | Promise<string> | | batchConvertSmart() | Smart batch | directory, saveTo | Promise<BatchResult> | | batchConvertFast() | Fast batch | directory, saveTo, language | Promise<BatchResult> | | batchConvertAuto() | Auto batch | directory, saveTo | Promise<BatchResult> |

📄 License

MIT License - see LICENSE file for details.

🤝 Support


Made with ❤️ by the Hashub Team