@polina123/node-red-ocr-image-to-text

v1.0.5

Published

6 months ago

Node-RED node for text recognition from images and PDF files using EasyOCR/PaddleOCR/Tesseract. Supports 20+ languages, parallel processing (up to 10x speedup), PDF text extraction, and automatic text orientation detection.

Node-RED Image OCR (Multi-Engine)

npm version

Node-RED node for automatic text recognition from images and PDF files using multiple OCR engines: PaddleOCR (default), EasyOCR, and Tesseract. Supports 20+ languages with parallel processing for up to 10x speedup.

Features

Three OCR Engines: PaddleOCR (recommended), EasyOCR, Tesseract
Text Recognition from JPG, JPEG, PNG images and PDF files
PDF Support - extract text layer or OCR scanned PDFs
Auto-Rotate - automatic text orientation detection and correction
Multi-language - support for 20+ languages simultaneously
Parallel Processing - up to 8-10x speedup for multiple images
Speed Optimization - image resizing before processing
Flexible Configuration - choose OCR engine, languages, GPU, quality control
Error Handling - detailed processing status information
Performance Monitoring - processing time and statistics
Batch Processing - multiple images at once

OCR Engine Comparison

| OCR Engine | Installation | Speed* | Quality | Languages | Recommendation | |------------|--------------|--------|---------|-----------|----------------| | Tesseract | System install | Fastest (0.5s) | Good | 100+ languages | Best for speed | | EasyOCR | pip install | Medium (2.3s) | Excellent | 80+ languages | Best quality | | PaddleOCR | pip install | Slower (4.2s) | Good | Multilingual | Balanced option |

*Test conditions: Single image, CPU, resize_factor 0.5, languages: en+ru

Performance

NEW: Parallel Processing

The node now supports parallel processing of multiple images simultaneously!

Speedup: up to 8-10x when processing multiple images

// Example: 10 images processed in parallel
// Sequential: 20 seconds
// Parallel (4 threads): ~5 seconds

Recommended Settings

| Scenario | Resize Factor | GPU | Mode | Speed | Quality | |----------|---------------|-----|------|-------|---------| | Maximum speed | 0.3 | Yes | Parallel | Very Fast | Medium | | Balanced (recommended) | 0.5 | Yes | Parallel | Fast | Good | | High quality | 0.7 | Yes | Batch | Medium | High | | Critical quality | 1.0 | Yes | Sequential | Slow | Maximum |

Recommendation: Use resize_factor: 0.5 with parallel processing for 8-10x speedup!

Installation

Method 1: Via Palette Manager (recommended)

Open Node-RED
Go to menu → Manage Palette
In the Install tab, search for @polina123/node-red-ocr-image-to-text
Click Install

Method 2: Via npm

cd ~/.node-red
npm install @polina123/node-red-ocr-image-to-text

Method 3: Manual installation for development

# Clone the repository
git clone https://github.com/polinaSuvorova/node_red_ocr_image_to_text.git
cd node_red_ocr_image_to_text

# Create symlink
npm link

# In Node-RED folder
cd ~/.node-red
npm link @polina123/node-red-ocr-image-to-text

Installing Python Dependencies

OCR engines will automatically install dependencies on first use. For manual installation:

# Core dependencies (required)
pip install opencv-python pillow

# PaddleOCR (recommended)
pip install paddlepaddle paddleocr

# EasyOCR
pip install easyocr

# Tesseract (requires system installation)
pip install pytesseract

# PDF Support (optional)
pip install PyPDF2 pdf2image
# Also requires poppler-utils system package

Installing PDF Support (Optional)

Python packages:

pip install PyPDF2 pdf2image

System requirements for pdf2image:

Windows:

Download poppler from https://github.com/oschwartz10612/poppler-windows/releases/
Extract and add to PATH

Linux (Ubuntu/Debian):

sudo apt install poppler-utils

macOS:

brew install poppler

Installing Tesseract OCR

Tesseract requires separate system installation:

Windows:

Download from https://github.com/UB-Mannheim/tesseract/wiki
Install and add to PATH

Linux (Ubuntu/Debian):

sudo apt install tesseract-ocr tesseract-ocr-rus tesseract-ocr-eng

macOS:

brew install tesseract

Usage

Input Data

The node expects a JSON object with an array of files:

{
  "files": [
    {
      "FILENAME": "document.jpg",
      "MIMETYPE": "image/jpeg",
      "FILEBASE64": "base64_encoded_image_data..."
    },
    {
      "FILENAME": "screenshot.png",
      "MIMETYPE": "image/png",
      "FILEBASE64": "base64_encoded_image_data..."
    },
    {
      "FILENAME": "scanned.pdf",
      "MIMETYPE": "application/pdf",
      "FILEBASE64": "base64_encoded_pdf_data..."
    }
  ]
}

Output Data

Image Output:

{
  "success": true,
  "results": [
    {
      "filename": "document.jpg",
      "mimetype": "image/jpeg",
      "status": "success",
      "text": "Recognized text from image combined into one line",
      "ocr_model": "paddleocr",
      "languages": ["en", "ru"],
      "resize_factor": 0.5,
      "processing_time": 1.23,
      "rotation_applied": 3.5,
      "error": null
    }
  ],
  "processed": 1,
  "total": 1,
  "errors": 0,
  "ocr_model": "paddleocr",
  "languages": ["en", "ru"],
  "resize_factor": 0.5,
  "performance": {
    "total_time": 1.45,
    "average_time_per_image": 1.23,
    "images_per_second": 0.81
  }
}

PDF Output:

{
  "success": true,
  "results": [
    {
      "filename": "document.pdf",
      "mimetype": "application/pdf",
      "status": "success",
      "text": "Extracted text from PDF...",
      "ocr_model": "tesseract",
      "languages": ["en", "ru"],
      "processing_time": 2.34,
      "pdf_info": {
        "pages": 5,
        "method": "text_layer",
        "pages_processed": 5
      },
      "error": null
    }
  ],
  "processed": 1,
  "total": 1,
  "errors": 0
}

Processing Statuses

| Status | Description | |--------|-------------| | success | Text successfully recognized | | no_text | No text found in image | | error | Error processing image | | unsupported_format | Unsupported image format | | no_data | No image data provided |

Node Settings

| Parameter | Description | Default | |-----------|-------------|---------| | Name | Node name (optional) | - | | OCR Model | Choose OCR engine: PaddleOCR, EasyOCR, Tesseract | paddleocr | | Languages | Languages for recognition (multiple selection) | en, ru | | Supported Formats | File formats to process: JPEG/JPG, PNG, PDF | jpeg, png | | Resize Factor | Image resize coefficient (0.1-1.0) | 1.0 | | Use GPU | Use GPU for acceleration (requires CUDA) | false | | Auto-Rotate | Automatic text orientation detection and correction | false | | Processing Mode | Sequential or Parallel processing | sequential | | Max Threads | Number of parallel threads (1-32) | 4 |

Supported Languages

English (en)
Russian (ru)
German (de)
French (fr)
Spanish (es)
And many more...

Usage Examples

Example 1: Fast processing with optimal settings

// Node configuration in Node-RED
{
  "id": "ocr-node",
  "type": "ocr-engine",
  "name": "Fast Recognition",
  "ocr_model": "paddleocr",
  "languages": ["en", "ru"],
  "resize_factor": 0.5,
  "gpu": true
}

Example 2: Preparing data for processing

// JavaScript function to prepare data
var files = [
  {
    FILENAME: "image1.jpg",
    MIMETYPE: "image/jpeg",
    FILEBASE64: msg.payload.base64data1
  },
  {
    FILENAME: "image2.png",
    MIMETYPE: "image/png",
    FILEBASE64: msg.payload.base64data2
  }
];

msg.payload = { files: files };
return msg;

Example 3: Processing results with performance metrics

// After OCR node
if (msg.payload.success) {
  node.log(`OCR engine: ${msg.payload.ocr_model}`);
  node.log(`Processed ${msg.payload.processed} of ${msg.payload.total} images`);
  node.log(`Total time: ${msg.payload.performance.total_time} sec`);
  node.log(`Speed: ${msg.payload.performance.images_per_second} img/sec`);

  msg.payload.results.forEach(function(result) {
    if (result.status === 'success') {
      node.log(`Text from ${result.filename}: ${result.text.substring(0, 100)}...`);
      node.log(`Processing time: ${result.processing_time} sec`);
    } else if (result.status === 'no_text') {
      node.warn(`No text found in ${result.filename}`);
    } else if (result.status === 'error') {
      node.error(`Error in ${result.filename}: ${result.error}`);
    }
  });
} else {
  node.error('OCR error: ' + msg.payload.error);
}

Requirements

Node.js >= 14.0.0
Node-RED >= 2.0.0
Python >= 3.7
Memory: ~2GB+ RAM (for recognition models)

Supported File Formats

JPEG/JPG (.jpg, .jpeg)
PNG (.png)
PDF (.pdf) - supports both text-based and scanned PDFs

Troubleshooting

Error: "Module not found"

# Reinstall Python dependencies
pip install --upgrade paddleocr easyocr pytesseract opencv-python pillow

Error: "GPU not available"

# Option 1: Install CUDA to use GPU
# Option 2: Disable "Use GPU" option in node settings

Slow Performance

Recommendations:

USE PARALLEL PROCESSING for multiple images
Reduce image size (resize_factor: 0.5)
Use GPU for acceleration (requires CUDA)
Use PaddleOCR instead of EasyOCR
Run performance test: npm run test:async

Tesseract not found

Windows:

Download and install: https://github.com/UB-Mannheim/tesseract/wiki
Add path to PATH

Linux:

sudo apt install tesseract-ocr tesseract-ocr-rus tesseract-ocr-eng

macOS:

brew install tesseract

Development

Project Structure

node_red_ocr_image_to_text/
├── nodes/
│   └── ocr-image-to-text/
│       ├── node_red_ocr_image_to_text.html    # UI configuration
│       ├── node_red_ocr_image_to_text.js      # Node.js logic
│       ├── node_red_ocr_image_to_text.py      # Python main script
│       ├── ocr_engine.py                      # Abstract base class
│       ├── easyocr_engine.py                  # EasyOCR implementation
│       ├── paddleocr_engine.py                # PaddleOCR implementation
│       ├── tesseract_engine.py                # Tesseract implementation
│       └── pdf_processor.py                   # PDF processing (NEW)
├── test/
│   ├── .mocharc.json                          # Mocha configuration
│   ├── sample_test_data.json                  # Test data
│   ├── test_ocr.py                            # Test script
│   ├── test_performance_comparison.py         # Performance comparison
│   ├── test_async_performance.py              # Parallel processing test
│   ├── test_model_switching.py                # Model switching test
│   ├── test_orientation.py                    # Orientation detection test
│   ├── test_pdf_ocr.py                        # PDF processing test
│   ├── node_spec.js                           # Node-RED unit tests
│   └── README.md                              # Testing documentation
├── documens_readme_md/
│   ├── PDF_SUPPORT.md                         # PDF feature documentation
│   ├── README_FOR_DEVELOPERS.md               # Developer guide (Russian)
│   └── PERFORMANCE.md                         # Performance optimization
├── logs_test/                                 # Test logs (in .gitignore)
│   └── performance_test_results_*.json        # Performance test results
├── .npmignore                                 # npm package exclusions
├── package.json
├── LICENSE
└── README.md

Local Development

Clone the repository
Install dependencies: npm install
Create symlink: npm link
In Node-RED folder: npm link @polina123/node-red-ocr-image-to-text
Restart Node-RED

Testing

# Basic test (requires test/sample_test_data.json)
npm run test:python

# Performance comparison of all engines
npm run test:performance

# Parallel vs Sequential processing test
npm run test:async

# Model switching test
npm run test:switching

# Text orientation detection test
npm run test:orientation

# PDF processing test
npm run test:pdf

# Node-RED unit tests
npm test

License

MIT License - see LICENSE file for details.

Support

If you have problems or questions:

Create Issue on GitHub
EasyOCR Documentation: https://github.com/JaidedAI/EasyOCR
PaddleOCR Documentation: https://github.com/PaddlePaddle/PaddleOCR
Tesseract Documentation: https://github.com/tesseract-ocr/tesseract

Acknowledgments

Based on EasyOCR by Jaided AI
Uses PaddleOCR by PaddlePaddle
Integrates Tesseract OCR by Google

Additional Information

Tesseract Architecture

Pytesseract is a Python wrapper for Tesseract OCR. Tesseract OCR is a powerful open-source library for optical character recognition (OCR), originally developed by HP and now maintained by Google.

How it works:

Python code → pytesseract → Tesseract OCR (system program) → result

Two-component system:

pytesseract (Python package) — installed via pip
tesseract (system program) — installed separately in OS

Author: Polina Suvorova Repository: https://github.com/polinaSuvorova/node_red_ocr_image_to_text

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

Node-RED Image OCR (Multi-Engine)

Features

OCR Engine Comparison

Performance

NEW: Parallel Processing

Recommended Settings

Installation

Method 1: Via Palette Manager (recommended)

Method 2: Via npm

Method 3: Manual installation for development

Installing Python Dependencies

Installing PDF Support (Optional)

Installing Tesseract OCR

Usage

Input Data

Output Data

Processing Statuses

Node Settings

Supported Languages

Usage Examples

Example 1: Fast processing with optimal settings

Example 2: Preparing data for processing

Example 3: Processing results with performance metrics

Requirements

Supported File Formats

Troubleshooting

Error: "Module not found"

Error: "GPU not available"

Slow Performance

Tesseract not found

Development

Project Structure

Local Development

Testing

License

Support

Acknowledgments

Additional Information

Tesseract Architecture