@siva-sub/client-ocr-app

v4.0.0

Published

6 months ago

Multi-engine browser OCR with visual results. Choose between fast mobile models (PPU) or accurate server models (OnnxOCR). Features Tesseract fallback, visual bounding boxes, 100% client-side processing, PWA support, and intelligent model caching.

🔍 Smart OCR - Multi-Engine Visual Text Recognition

Advanced browser-based OCR with multiple engines and visual results

Choose between fast mobile models or accurate server models. Features PPU-Paddle OCR, OnnxOCR, and Tesseract fallback. 100% client-side processing.

🚀 Try Live Demo | 📦 NPM Package | 📖 Documentation

✨ What's New in v4.0

🚀 Complete Multi-Engine Implementation

PPU-Paddle-OCR Complete: Full implementation with deskew, detection, angle classification
OnnxOCR TextSystem: Complete pipeline with TextDetector, TextRecognizer, TextClassifier
Enhanced Processing: All preprocessing and postprocessing methods from both repositories
Visual Results: Advanced visualization with polygon boxes and confidence scores

🎯 Advanced Features

Auto-Deskew: Automatic image rotation correction
Angle Classification: 180° rotation detection and correction
Smart Padding: Adaptive padding for better recognition
Batch Processing: Efficient batch recognition for multiple regions
WebGL Acceleration: Optimized for mobile and desktop performance

🌟 Features

🔐 100% Private: All processing in browser, no data uploads
⚡ Multiple Engines:
- PPU-Paddle-OCR: Fast mobile models with complete processing pipeline
- OnnxOCR: Accurate server models with TextSystem architecture
- Tesseract: Reliable fallback with word-level detection
🎨 Advanced Visualization:
- Polygon and rectangle bounding boxes
- Color-coded confidence scores
- Reading order visualization
📱 PWA Support: Install as app, works offline with cached models
🌍 Multi-language: English, Chinese, Japanese, Korean with proper character dictionaries
💾 Smart Model Management:
- Automatic model downloading with progress tracking
- IndexedDB caching for instant switching
- Model size optimization
📊 Comprehensive Metrics:
- Per-stage processing time
- Confidence scores for each detected text
- Model performance comparison

🚀 Quick Start

Online Demo

Visit https://siva-sub.github.io/client-ocr-app

NPM Installation

npm install @siva-sub/client-ocr-app

Local Development

git clone https://github.com/siva-sub/client-ocr-app.git
cd client-ocr-app
npm install
npm run dev

📖 Documentation

Available OCR Engines

PPU Mobile (Fast) ⚡

Optimized for mobile and real-time processing
PP-OCRv5 detection + PP-OCRv4 recognition
~50-200ms processing time
Best for: Live camera OCR, mobile apps

OnnxOCR v5 (Most Accurate) 🎯

Latest PP-OCRv5 models with angle classification
Highest accuracy on complex documents
~500-1000ms processing time
Best for: Documents, receipts, forms

OnnxOCR v4 (Balanced) ⚖️

Good balance of speed and accuracy
PP-OCRv4 full pipeline
~300-500ms processing time
Best for: General purpose OCR

OnnxOCR v2 (Server) 🖥️

Heavy server models for maximum accuracy
Designed for backend processing
~1000-2000ms processing time
Best for: Batch processing, archives

Tesseract (Fallback) 🛡️

Classic OCR engine
Works offline without model downloads
~500-1500ms processing time
Best for: Fallback, offline usage

Usage Guide

Select Engine: Choose based on your speed/accuracy needs
Upload Image: Drag & drop or click to select
Process: Models download automatically on first use
View Results:
- Visual mode: See bounding boxes with confidence
- Text mode: Get clean, copyable text
- Download results as JSON

Model Management

Models are cached after first download
✓ indicates cached models (instant loading)
⬇ indicates models need downloading
Switch engines anytime without re-uploading

🛠️ Technology Stack

PPU-Paddle-OCR: Mobile-optimized models
OnnxOCR: High-accuracy ONNX models
ONNX Runtime Web: Hardware-accelerated inference
Tesseract.js: Classic OCR fallback
OpenCV.js: Image preprocessing
Vite + Mantine: Modern UI framework

📊 Performance Comparison

| Engine | Speed | Accuracy | Model Size | Use Case | |--------|-------|----------|------------|----------| | PPU Mobile | ⚡⚡⚡⚡⚡ | ⭐⭐⭐ | 12MB | Real-time OCR | | OnnxOCR v5 | ⚡⚡ | ⭐⭐⭐⭐⭐ | 25MB | Documents | | OnnxOCR v4 | ⚡⚡⚡ | ⭐⭐⭐⭐ | 20MB | General | | OnnxOCR v2 | ⚡ | ⭐⭐⭐⭐⭐ | 40MB | Batch | | Tesseract | ⚡⚡ | ⭐⭐⭐ | 11MB | Fallback |

🤝 Contributing

Contributions welcome! Please feel free to submit a Pull Request.

📝 License

MIT License - see LICENSE file

🙏 Acknowledgments

PaddleOCR - State-of-the-art models
PPU-Paddle-OCR - Mobile optimization
OnnxOCR - ONNX implementation
Tesseract.js - Classic OCR engine