n8n-nodes-doctr

v0.1.7

Published

3 months ago

Extract text from images using docTR OCR in n8n workflows

0High
0Medium
0Low

sproilex

n8n-community-node-package n8n ocr doctr text-recognition document-processing

n8n-nodes-doctr

This is an n8n community node that integrates docTR (Document Text Recognition) for extracting text from images in your n8n workflows.

docTR is a state-of-the-art OCR (Optical Character Recognition) library that uses deep learning models to extract text from document images with high accuracy.

n8n is a fair-code licensed workflow automation platform.

Installation Operations Compatibility Usage Resources

Installation

Install the n8n Node

Follow the installation guide in the n8n community nodes documentation.

Quick install:

npm install n8n-nodes-doctr

Python Prerequisites

This node requires Python 3.8+ and the docTR library to be installed on the same machine where n8n is running.

Install Python 3.8 or higher
- Ensure python3 is available in your system PATH

Install docTR and dependencies

pip install -r requirements.txt

Or install manually:

# For PyTorch backend (recommended)
pip install python-doctr[torch]

# Or for TensorFlow backend
pip install python-doctr[tf]

Verify installation

python3 -c "from doctr.models import ocr_predictor; print('docTR installed successfully')"

Operations

The Doctr OCR node provides a single operation:

Extract Text from Image

Processes binary image data through docTR's OCR engine and returns extracted text.

Parameters:

Binary Property: Name of the binary property containing the image (default: data)
Output Format: Choose what data to return:
- Plain Text Only: Returns just the extracted text as a string
- Structured Data Only: Returns the full OCR result with word/line/block positions
- Both: Returns both plain text and structured data

Supported Image Formats:

PNG
JPG/JPEG
TIFF
BMP
And other common image formats supported by PIL/Pillow

Compatibility

Minimum n8n version: 0.198.0
Tested with: n8n 1.0+
Python: 3.8, 3.9, 3.10, 3.11
docTR: 0.5.0+

Usage

Basic Text Extraction

Add a node that provides binary image data (e.g., HTTP Request, Read Binary File)
Add the Doctr OCR node
Configure the binary property name (usually data)
Select output format (e.g., "Plain Text Only")
The extracted text will be available in the output

Example Workflow

[Read Binary Files] → [Doctr OCR] → [Process Text]

Use Cases:

Extract text from scanned documents
Process receipts and invoices
Digitize handwritten notes
Extract data from screenshots
Process forms and questionnaires

Working with Structured Data

When you select "Structured Data Only" or "Both" as output format, you'll receive detailed position information:

{
  "structuredData": {
    "pages": [
      {
        "blocks": [
          {
            "lines": [
              {
                "words": [
                  {
                    "value": "text",
                    "confidence": 0.99,
                    "geometry": [[x1, y1], [x2, y2]]
                  }
                ]
              }
            ]
          }
        ]
      }
    ]
  }
}

This structured data is useful for:

Extracting specific regions of text
Preserving document layout
Filtering by confidence scores
Building custom text processing logic

Troubleshooting

Error: "Failed to start OCR process"

Ensure Python 3 is installed and accessible via python3 command
Verify docTR is installed: pip list | grep doctr

Error: "OCR processing error"

Check that the input is valid binary image data
Verify the image format is supported
Ensure sufficient memory is available for the OCR model

Slow Performance

First execution loads the model (can take 5-10 seconds)
Subsequent executions are faster
Large images take longer to process
Consider resizing very large images before OCR

Resources

Version History

0.1.0

Initial release
Support for plain text and structured data extraction
PyTorch and TensorFlow backend compatibility

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

n8n-nodes-doctr

Installation

Install the n8n Node

Python Prerequisites

Operations

Extract Text from Image

Compatibility

Usage

Basic Text Extraction

Example Workflow

Working with Structured Data

Troubleshooting

Resources

Version History

0.1.0