hashub-docapp-js
v1.0.0
Published
JavaScript/TypeScript SDK for Hashub Document Processing API
Maintainers
Readme
HashubDocApp JavaScript/Node.js SDK
Professional JavaScript/Node.js SDK for the HashubDocApp API - Advanced OCR, document conversion, and text extraction service.
✨ Features
- 🚀 Fast OCR: Quick text extraction with 76+ language support
- 🧠 Smart OCR: High-quality OCR with layout preservation
- 📄 Document Conversion: Office documents (Word, Excel) and HTML to Markdown/Text
- 🔄 Batch Processing: Process multiple files with intelligent categorization
- 🌍 Multi-language: Support for 76+ languages with ISO 639-1 codes
- 🎨 Image Enhancement: 11 pre-configured enhancement presets
- 📊 Progress Tracking: Real-time progress monitoring with callbacks
- ⚡ Rate Limiting: Built-in API throttling protection
- 🔷 TypeScript: Fully typed with comprehensive TypeScript definitions
- 🌐 Universal: Works in Node.js and modern browsers
🚀 Quick Start
Installation
Install the latest stable version from npm:
npm install hashub-docapp-jsOr with yarn:
yarn add hashub-docapp-jsOr install the development version from GitHub:
npm install git+https://github.com/hasanbahadir/hashub-doc-js.gitBasic Usage
import { DocAppClient } from 'hashub-docapp-js';
// Initialize client
const client = new DocAppClient('your_api_key_here');
// Fast OCR - Quick text extraction
const text = await client.convertFast('document.pdf', { language: 'en' });
console.log(text);
// Smart OCR - High-quality with layout preservation
const markdown = await client.convertSmart('document.pdf');
console.log(markdown);TypeScript Usage
import { DocAppClient, ConvertFastOptions, ConvertSmartOptions } from 'hashub-docapp-js';
const client = new DocAppClient(process.env.HASHUB_API_KEY!);
// Fast OCR with language support
const fastOptions: ConvertFastOptions = {
language: 'tr',
enhancement: 'scan_low_dpi',
output: 'markdown'
};
const result = await client.convertFast('document.pdf', fastOptions);
console.log(result);📖 Core Methods
convertFast()
Fast OCR for quick text extraction with language support.
async convertFast(
filePath: string,
options?: ConvertFastOptions
): Promise<string>
interface ConvertFastOptions {
output?: 'markdown' | 'txt' | 'json';
language?: string;
enhancement?: string;
returnType?: 'content' | 'url' | 'file';
saveTo?: string;
showProgress?: boolean;
timeout?: number;
}Parameters:
filePath: Path to PDF or image fileoptions.output: Output format ("markdown", "txt", "json")options.language: Language code (ISO 639-1 like "en", "tr", "de")options.enhancement: Image enhancement preset (optional)options.returnType: "content" (default), "url", or "file"options.saveTo: File path when returnType="file"options.showProgress: Show progress updates (default: true)options.timeout: Maximum wait time in seconds (default: 300)
Examples:
// Basic fast OCR
const text = await client.convertFast('scan.pdf');
// With Turkish language
const text = await client.convertFast('document.pdf', { language: 'tr' });
// With enhancement for low-quality scans
const text = await client.convertFast('scan.pdf', {
enhancement: 'scan_low_dpi'
});
// Save to file
await client.convertFast('document.pdf', {
returnType: 'file',
saveTo: 'output.txt'
});convertSmart()
High-quality OCR with layout preservation and structure detection.
async convertSmart(
filePath: string,
options?: ConvertSmartOptions
): Promise<string>
interface ConvertSmartOptions {
output?: 'markdown' | 'txt' | 'json';
returnType?: 'content' | 'url' | 'file';
saveTo?: string;
showProgress?: boolean;
timeout?: number;
}Examples:
// Smart OCR with layout preservation
const markdown = await client.convertSmart('complex_document.pdf');
// Save as file
await client.convertSmart('document.pdf', {
returnType: 'file',
saveTo: 'output.md'
});
// Different output format
const jsonData = await client.convertSmart('document.pdf', {
output: 'json'
});🌍 Language Support
The SDK supports 76+ languages with ISO 639-1 codes:
import { LanguageHelper } from 'hashub-docapp-js';
// List all supported languages
const languages = LanguageHelper.listLanguages();
console.log(`Supported languages: ${languages.length}`);
// Get language info
const turkishInfo = LanguageHelper.getLanguageInfo('tr');
console.log(turkishInfo); // { english: 'Turkish', native: 'Türkçe', iso: 'tr', apiCode: 'lang_tur_tr' }
// Use with convertFast
const text = await client.convertFast('document.pdf', { language: 'tr' }); // Turkish
const text2 = await client.convertFast('document.pdf', { language: 'de' }); // German
const text3 = await client.convertFast('document.pdf', { language: 'zh' }); // ChinesePopular Language Codes:
en- Englishtr- Turkishde- Germanfr- Frenches- Spanishzh- Chinese (Simplified)ar- Arabicru- Russianja- Japaneseko- Korean
🎨 Image Enhancement Presets
The SDK includes 11 pre-configured enhancement presets for different document types:
// Enhancement presets (use with convertFast)
await client.convertFast('scan.pdf', { enhancement: 'document_crisp' }); // Clean documents
await client.convertFast('scan.pdf', { enhancement: 'scan_low_dpi' }); // Low quality scans
await client.convertFast('scan.pdf', { enhancement: 'camera_shadow' }); // Phone photos
await client.convertFast('scan.pdf', { enhancement: 'photocopy_faded' }); // Faded copies
await client.convertFast('scan.pdf', { enhancement: 'inverted_scan' }); // Inverted colors
await client.convertFast('scan.pdf', { enhancement: 'noisy_dots' }); // Noisy artifacts
await client.convertFast('scan.pdf', { enhancement: 'tables_fine' }); // Tables and grids
await client.convertFast('scan.pdf', { enhancement: 'receipt_thermal' }); // Receipts
await client.convertFast('scan.pdf', { enhancement: 'newspaper_moire' }); // Newspapers
await client.convertFast('scan.pdf', { enhancement: 'fax_low_quality' }); // Fax documents
await client.convertFast('scan.pdf', { enhancement: 'blueprint' }); // Technical drawings📄 Document Conversion
convertDoc()
Convert Word, Excel, and other office documents.
async convertDoc(
filePath: string,
options?: ConvertDocOptions
): Promise<string>
interface ConvertDocOptions {
output?: 'markdown' | 'txt' | 'json';
returnType?: 'content' | 'url' | 'file';
saveTo?: string;
options?: Record<string, any>;
}Examples:
// Convert Word document to Markdown
const markdown = await client.convertDoc('document.docx');
// Convert Excel to text
const text = await client.convertDoc('spreadsheet.xlsx', { output: 'txt' });
// Save to file
await client.convertDoc('presentation.pptx', {
returnType: 'file',
saveTo: 'output.md'
});convertHtmlString()
Convert HTML string content to other formats.
async convertHtmlString(
htmlContent: string,
options?: ConvertHtmlOptions
): Promise<string>
interface ConvertHtmlOptions {
output?: 'markdown' | 'txt' | 'json';
returnType?: 'content' | 'url' | 'file';
saveTo?: string;
options?: Record<string, any>;
}Examples:
const html = '<h1>Title</h1><p>Content</p>';
const markdown = await client.convertHtmlString(html);🔄 Batch Processing
batchConvertSmart()
Smart batch processing with automatic file categorization.
async batchConvertSmart(
directory: string,
saveTo: string,
options?: BatchConvertSmartOptions
): Promise<BatchResult>
interface BatchConvertSmartOptions {
outputFormat?: 'txt' | 'markdown' | 'json';
recursive?: boolean;
showProgress?: boolean;
maxWorkers?: number;
timeout?: number;
}
interface BatchResult {
processedCount: number;
successCount: number;
failedCount: number;
results: Array<{
sourceFile: string;
outputFile?: string;
status: 'success' | 'failed';
error?: string;
}>;
}Example:
// Process all files in directory intelligently
const results = await client.batchConvertSmart(
'./documents',
'./output',
{ outputFormat: 'markdown' }
);
console.log(`Processed ${results.processedCount} files`);
console.log(`Success: ${results.successCount}, Failed: ${results.failedCount}`);batchConvertFast()
Fast batch OCR for images and PDFs.
async batchConvertFast(
directory: string,
saveTo: string,
options?: BatchConvertFastOptions
): Promise<BatchResult>
interface BatchConvertFastOptions {
language?: string;
enhancement?: string;
outputFormat?: 'txt' | 'markdown' | 'json';
recursive?: boolean;
showProgress?: boolean;
maxWorkers?: number;
timeout?: number;
}batchConvertAuto()
Automatic processing mode selection based on file types.
async batchConvertAuto(
directory: string,
saveTo: string,
options?: BatchConvertAutoOptions
): Promise<BatchResult>
interface BatchConvertAutoOptions {
language?: string;
enhancement?: string;
outputFormat?: 'txt' | 'markdown' | 'json';
recursive?: boolean;
showProgress?: boolean;
maxWorkers?: number;
timeout?: number;
}📊 Return Types
The SDK supports three return types for conversion methods:
1. Content (Default)
const text = await client.convertFast('doc.pdf', { returnType: 'content' });
console.log(text); // Direct text content2. URL
const url = await client.convertFast('doc.pdf', { returnType: 'url' });
console.log(url); // Download URL for the result3. File
const path = await client.convertFast('doc.pdf', {
returnType: 'file',
saveTo: 'output.txt'
});
console.log(path); // Path to saved file🛠️ Job Management
getStatus()
Check job status.
const status = await client.getStatus(jobId);
console.log(`Status: ${status.status}`);
console.log(`Progress: ${status.progress || 0}%`);wait()
Wait for job completion with polling.
const finalStatus = await client.wait(jobId, { interval: 2000, timeout: 300000 });getResult()
Get completed job result.
const result = await client.getResult(jobId);
console.log(result.content); // The extracted/converted textcancel()
Cancel a running job.
await client.cancel(jobId);🔧 Configuration
Environment Variables
export HASHUB_API_KEY="your_api_key_here"Client Configuration
const client = new DocAppClient('your_api_key', {
baseUrl: 'https://doc.hashub.dev/api/v1', // Default
timeout: 30000, // Request timeout (ms)
maxRetries: 3, // Max retry attempts
rateLimitDelay: 2000 // Min delay between requests (ms)
});🎯 Usage Examples
Basic OCR
import { DocAppClient } from 'hashub-docapp-js';
const client = new DocAppClient('your_api_key');
// Extract text from PDF
const text = await client.convertFast('invoice.pdf', { language: 'en' });
console.log(text);
// High-quality OCR with layout
const markdown = await client.convertSmart('complex_document.pdf');
console.log(markdown);Multi-language Processing
// Process documents in different languages
const documents = [
{ path: 'english_doc.pdf', lang: 'en' },
{ path: 'turkish_doc.pdf', lang: 'tr' },
{ path: 'german_doc.pdf', lang: 'de' },
{ path: 'chinese_doc.pdf', lang: 'zh' }
];
for (const doc of documents) {
const text = await client.convertFast(doc.path, { language: doc.lang });
console.log(`${doc.lang}: ${text.substring(0, 100)}...`);
}Enhanced Image Processing
// Process different types of scanned documents
const scanTypes = {
'old_book.pdf': 'scan_low_dpi',
'phone_photo.jpg': 'camera_shadow',
'faded_copy.pdf': 'photocopy_faded',
'receipt.jpg': 'receipt_thermal',
'technical_drawing.pdf': 'blueprint'
};
for (const [filePath, enhancement] of Object.entries(scanTypes)) {
const text = await client.convertFast(filePath, {
enhancement,
language: 'en'
});
console.log(`Processed ${filePath} with ${enhancement}`);
}Batch Processing Example
// Process entire directory
const results = await client.batchConvertAuto(
'./input_docs',
'./output',
{
outputFormat: 'markdown',
showProgress: true
}
);
console.log(`✅ Processed ${results.successCount} files successfully`);
results.results.forEach(result => {
if (result.status === 'success') {
console.log(` 📄 ${result.sourceFile} -> ${result.outputFile}`);
}
});🛡️ Error Handling
import {
DocAppClient,
AuthenticationError,
RateLimitError,
ProcessingError,
ValidationError
} from 'hashub-docapp-js';
const client = new DocAppClient('your_api_key');
try {
const result = await client.convertFast('document.pdf');
console.log(result);
} catch (error) {
if (error instanceof AuthenticationError) {
console.log('❌ Invalid API key');
} else if (error instanceof RateLimitError) {
console.log('⏳ Rate limit exceeded, wait and retry');
} else if (error instanceof ProcessingError) {
console.log(`💥 Processing failed: ${error.message}`);
} else if (error instanceof ValidationError) {
console.log(`📝 Validation error: ${error.message}`);
} else {
console.log(`📁 File not found or other error: ${error.message}`);
}
}🔄 Rate Limiting
The SDK includes built-in rate limiting to prevent API throttling:
- Default delay: 2 seconds between requests
- Automatic retry: Failed requests are retried with exponential backoff
- Progress tracking: Polls job status with appropriate intervals
// Configure rate limiting
const client = new DocAppClient('your_key', {
rateLimitDelay: 3000, // 3 second delay between requests
maxRetries: 5 // Retry failed requests up to 5 times
});📈 Performance Tips
Use appropriate modes:
convertFast()for simple text extraction with language supportconvertSmart()for complex layouts and formatting
Batch processing:
- Use batch methods for multiple files
- Adjust
maxWorkersbased on your API limits
Language specification:
- Always specify the correct language for better accuracy
- Use ISO codes for convenience (
"en","tr","de")
Enhancement presets:
- Choose the right preset for your document type
- Experiment with different presets for optimal results
🐛 Troubleshooting
Common Issues
1. Network Errors
// Ensure correct base URL
const client = new DocAppClient('your_key', {
baseUrl: 'https://doc.hashub.dev/api/v1'
});2. Rate Limiting
// Increase delay between requests
const client = new DocAppClient('your_key', {
rateLimitDelay: 3000
});3. Timeout Issues
// Increase timeout for large files
await client.convertSmart('large_file.pdf', { timeout: 600 });4. Language Errors
// Check supported languages
import { LanguageHelper } from 'hashub-docapp-js';
const languages = LanguageHelper.listLanguages();
console.log(languages.map(lang => lang.iso));📊 API Method Summary
| Method | Purpose | Key Parameters | Returns |
|--------|---------|----------------|---------|
| convertFast() | Fast OCR | filePath, language, enhancement | Promise<string> |
| convertSmart() | Smart OCR | filePath, output | Promise<string> |
| convertDoc() | Office docs | filePath, output | Promise<string> |
| convertHtmlString() | HTML conversion | htmlContent, output | Promise<string> |
| batchConvertSmart() | Smart batch | directory, saveTo | Promise<BatchResult> |
| batchConvertFast() | Fast batch | directory, saveTo, language | Promise<BatchResult> |
| batchConvertAuto() | Auto batch | directory, saveTo | Promise<BatchResult> |
📄 License
MIT License - see LICENSE file for details.
🤝 Support
- npm Package: hashub-docapp-js on npm
- Documentation: HashubDocApp Docs
- API Reference: API Documentation
- GitHub Repository: Source Code
- Support: Contact Support
Made with ❤️ by the Hashub Team
