ppu-paddle-ocr

v6.2.0

Published

6 days ago

Lightweight, probably the fastest PaddleOCR SDK in TypeScript. Runs anywhere JavaScript runs: Node.js, Bun, Deno, mobile react-native, web browsers, and browser extensions. Docker & CLI supported. The official SDK is browser-only. Accurate text detection

ppu-paddle-ocr

Lightweight, probably the fastest PaddleOCR SDK in TypeScript. Multilingual Support. Runs anywhere JavaScript runs: Node.js, Bun, Deno, web browsers, browser extensions, and React Native (iOS/Android). Docker & CLI supported. The official SDK is browser-only and significantly slower. Compare it for yourself.

Need it as HTTP-service? dockerized? we've got you covered! Quickly spins up ppu-paddle-ocr REST API here: ppu-paddle-ocr-serve. Need a CLI instead? sure here: ppu-paddle-ocr CLI support. Adjust the config & model for your use case, see config recommendation. Are you AI Agents? you can learn quickly by using the skill in the skill-ppu-paddle-ocr folder.

ppu-paddle-ocr demo

import { PaddleOcrService } from "ppu-paddle-ocr";

const service = new PaddleOcrService();
await service.initialize();

const result = await service.recognize("./receipt.jpg");
console.log(result.text);

await service.destroy();

Why ppu-paddle-ocr?

Lightweight, minimal dependencies, optimized for performance.
Pre-packed models, PP-OCRv6 tiny models (~6 MB, multilingual) are fetched and cached automatically on first run; the full-dictionary small/medium tiers and language-specific variants are one option away. Supports additional variants via ppu-paddle-ocr-models.
Runs everywhere, Node.js, Bun, Deno, web browsers, browser extensions, and React Native (iOS/Android). The official SDK is browser-only.
Customizable, custom models, dictionaries, and per-call overrides.
TypeScript, full type definitions.

Runtime Support

The same package, the same API, every JavaScript runtime:

| Runtime | How to install | Try it | | ------------------------- | ----------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------ | | Node.js | npm install ppu-paddle-ocr onnxruntime-node | npm package | | Bun | bun add ppu-paddle-ocr onnxruntime-node | npm package | | Deno | deno add jsr:@snowfluke/ppu-paddle-ocr | JSR package | | Web browser | npm install ppu-paddle-ocr onnxruntime-web (import /web subpath) | Live demo | | Browser extension | Same as web; bundle ppu-paddle-ocr/web with your extension's bundler. | Example extension repo | | Mobile (React Native) | npm install ppu-paddle-ocr onnxruntime-react-native @shopify/react-native-skia (import /mobile subpath) | Example app |

Installation

npm install ppu-paddle-ocr onnxruntime-node onnxruntime-web

Omit onnxruntime-node or onnxruntime-web depending on your target environment (Node/Bun vs browser).

CLI (global install)

To use the command line without bunx/npx, install globally, this puts a ppu-paddle-ocr command on your PATH:

npm install -g ppu-paddle-ocr onnxruntime-node      # or: bun add -g ppu-paddle-ocr onnxruntime-node
ppu-paddle-ocr recognize receipt.jpg

onnxruntime-node (~258MB of native binaries) is an optional peer dependency, install it alongside - the CLI needs it. Notes:

bun: ensure ~/.bun/bin is on your PATH (npm's global bin usually already is).
Updates are manual, re-run the install with @latest to upgrade. (bunx/npx always fetch the latest but can serve a stale cache; a global install pins the version and you own upgrades.)
It's still the Node/Bun build, a global install gives you a global command, not a standalone binary, so Node or Bun must be present.

Core Usage

Basic Recognition

import { PaddleOcrService } from "ppu-paddle-ocr";

const service = new PaddleOcrService({
  debugging: {
    debug: false,
    verbose: true,
  },
});

await service.initialize();

const result = await service.recognize("./assets/receipt.jpg");
console.log(result.text);

await service.destroy();

Custom Models

Using preset models, import constants for quick switching:

import { PaddleOcrService, V6_SMALL_MODEL, V5_EN_MOBILE_MODEL } from "ppu-paddle-ocr";

// PP-OCRv6 small (full dictionary; the default is PP-OCRv6 tiny)
const service = new PaddleOcrService({ model: V6_SMALL_MODEL });

// Switch to PP-OCRv5 English
const service = new PaddleOcrService({ model: V5_EN_MOBILE_MODEL });

Available presets:

v6: V6_TINY_MODEL (default), V6_SMALL_MODEL, V6_MEDIUM_MODEL
v5: V5_EN_MOBILE_MODEL, V5_EN_MOBILE_INT8_MODEL, V5_EN_SERVER_MODEL, V5_MOBILE_MODEL, V5_SERVER_MODEL
v5 languages: V5_ARABIC_MOBILE_MODEL, V5_CYRILLIC_MOBILE_MODEL, V5_DEVANAGARI_MOBILE_MODEL, V5_GREEK_MOBILE_MODEL, V5_ESLAV_MOBILE_MODEL, V5_KOREAN_MOBILE_MODEL, V5_LATIN_MOBILE_MODEL, V5_TAMIL_MOBILE_MODEL, V5_TELUGU_MOBILE_MODEL, V5_THAI_MOBILE_MODEL
v4: V4_EN_MOBILE_MODEL, V4_MOBILE_MODEL, V4_SERVER_MODEL, V4_SERVER_DOC_MODEL
v3: V3_MOBILE_MODEL, V3_JAPANESE_MOBILE_MODEL

Granular override, mix presets with custom paths:

const service = new PaddleOcrService({
  model: {
    ...V6_SMALL_MODEL,
    detection: "./models/custom-det.onnx", // Override just detection
  },
});

Fully custom, pass file paths, URLs, or ArrayBuffers:

const service = new PaddleOcrService({
  model: {
    detection: "./models/custom-det.onnx",
    recognition: "https://example.com/models/custom-rec.onnx",
    charactersDictionary: customDictArrayBuffer,
  },
});

await service.initialize();

Changing Models at Runtime

const service = new PaddleOcrService();
await service.initialize();

await service.changeDetectionModel("./models/new-det.onnx");
await service.changeRecognitionModel("./models/new-rec.onnx");
await service.changeTextDictionary("./models/new-dict.txt");

Per-Call Options

Each recognize() call accepts RecognizeOptions for fine-grained control:

// Custom dictionary for one-off recognition
const result = await service.recognize("./assets/receipt.jpg", {
  dictionary: "./models/new-dict.txt",
});

// Disable caching for fresh processing
const fresh = await service.recognize("./assets/receipt.jpg", {
  noCache: true,
});

// Combine options
const result = await service.recognize("./assets/receipt.jpg", {
  noCache: true,
  flatten: true,
  strategy: "per-box",
});

Detection Only

detect() runs only the detection model - no recognition - and returns the bounding boxes of the text regions it finds. Useful when you only need layout (where the text is), or want to feed the crops into your own pipeline.

// Just the boxes
const { boxes } = await service.detect(imageBuffer);
// boxes: { x, y, width, height }[] in original image coordinates

// Also return each region as a PNG buffer, index-aligned with boxes
const { crops } = await service.detect(imageBuffer, { crop: true });
await Bun.write("first-region.png", crops![0]!);

// Or write the crops straight to a folder as crop_000.png, crop_001.png, ...
await service.detect(imageBuffer, { saveCropsTo: "./out/regions" });

// DetectOptions extends DetectionOptions, so any tuning field
// can be overridden per call:
const { boxes: bigOnly } = await service.detect(imageBuffer, {
  minimumAreaThreshold: 500,
  maxSideLength: 960,
});

detect() accepts the same inputs as recognize() (ArrayBuffer, canvas, absolute path, or URL) and works on all entry points. saveCropsTo is Node/Bun only (ignored on web/mobile); crop: true is not supported on React Native, where the Skia canvas cannot be encoded to PNG.

Also available as the CLI detect command and the serve app's POST /v1/detect - see Command Line and apps/serve.

Command Line

The package ships a bin, so you can OCR without writing any code. In a project that has ppu-paddle-ocr and onnxruntime-node installed, bunx/npx resolve the local install directly; for zero-install runs use npx -p onnxruntime-node -p ppu-paddle-ocr ppu-paddle-ocr <args> or a global install:

# one image -> recognized text on stdout
bunx ppu-paddle-ocr recognize receipt.jpg

# a URL, as structured JSON
npx ppu-paddle-ocr recognize https://example.com/invoice.png --json --pretty

# detection only: bounding boxes as JSON, optionally saving each region as a PNG
bunx ppu-paddle-ocr detect receipt.jpg --save-crops ./regions --pretty

# zero-install (no local ppu-paddle-ocr): npx can pull both packages
npx -p onnxruntime-node -p ppu-paddle-ocr ppu-paddle-ocr recognize receipt.jpg

# many images (glob), fastest strategy, written to a file
bunx ppu-paddle-ocr batch "scans/*.png" --strategy cross-line --json -o results.json

# print each result as it finishes
bunx ppu-paddle-ocr stream "scans/*.png"

# pick a catalogue preset by name (granular --model-* flags override parts)
bunx ppu-paddle-ocr recognize receipt.jpg --model v6-tiny
bunx ppu-paddle-ocr recognize receipt.jpg --model v5-thai-mobile

# pre-warm / clear the model cache, inspect the active config (+ preset list)
bunx ppu-paddle-ocr download-models
bunx ppu-paddle-ocr clear-cache
bunx ppu-paddle-ocr models --json

Every PaddleOptions / RecognizeOptions field maps to a flag:

Recognized text goes to stdout; progress and logs go to stderr, so output pipes cleanly. Exit codes: 0 success, 1 runtime error, 2 usage error.

Run bunx ppu-paddle-ocr help for the full reference. The CLI uses the default v6 tiny models unless you select a --model preset or override the --model-* flags.

Batch Recognition

batchRecognize() runs recognize() over many images with bounded concurrency, so memory stays in check: at most concurrency images are decoded and in flight at once. Results are returned index-aligned to the inputs regardless of completion order.

const results = await service.batchRecognize([buf1, buf2, buf3]);
results.forEach((r, i) => console.log(i, r.text));

Concurrency defaults to "auto", 1 when an accelerator provider (CUDA, WebGPU) is configured (a shared session serializes device work anyway, and parallel runs would stack VRAM), and a small CPU default otherwise to overlap JS preprocessing with native inference. Override it explicitly when you know your hardware:

await service.batchRecognize(images, { concurrency: 8, flatten: true });

Use settle: true to keep going when an image fails, each slot becomes { status, value | reason } instead of the call rejecting:

const results = await service.batchRecognize(images, { settle: true });
for (const r of results) {
  if (r.status === "fulfilled") console.log(r.value.text);
  else console.error("failed:", r.reason);
}

Track progress and cancel with the usual primitives:

const ac = new AbortController();
await service.batchRecognize(images, {
  signal: ac.signal,
  onProgress: (done, total) => console.log(`${done}/${total}`),
});

To consume results as they finish (and avoid buffering the whole batch), stream them, each item carries its input index for reordering:

for await (const item of service.batchRecognizeStream(images)) {
  if (item.status === "fulfilled") console.log(item.index, item.value.text);
}

batchRecognize / batchRecognizeStream also accept any Iterable or AsyncIterable of inputs, so a directory walk or queue never has to be materialized in memory at once. All RecognizeOptions (flatten, strategy, dictionary, noCache) are accepted and applied to every image. See BatchRecognizeOptions for the full surface.

Recognition Strategies

Recognition strategies control how detected text regions are cropped from the canvas and fed into the recognition model. Fewer inference calls means faster throughput.

See Choosing a model and configuration for a workload-based selection matrix.

Strategies are set in RecognitionOptions:

const service = new PaddleOcrService({
  recognition: { strategy: "cross-line" },
});
await service.initialize();

recognition strategies

Choosing a Model and Configuration

The defaults (PP-OCRv6 tiny, per-line, maxSideLength: "auto", minimumAreaThreshold: 20, minimumConfidence: 0.5, opencv engine) are tuned to be fast and accurate across the tested corpus, including the receipt photo.

For dense document pages or full multilingual coverage, step up to V6_SMALL_MODEL. Each recipe below was validated against the committed example image it names.

Model families

v5 models are single-language specialists; v6 models are multilingual (one model, 50+ languages). If your documents are always one known language, the matching v5 model avoids cross-language confusion; if the language varies or mixes, stay on v6.

Parameter counts, file names, and preset-switching code live in PP-OCRv6 Models.

Input characteristics

| Input looks like | What to set | Why | | :-------------------------------- | :------------------------------------------------------------ | :--------------------------------------------------------------------------------------------------------------- | | Dark theme (light text on dark) | nothing, keep defaults | models read inverted text natively; pre-inverting hurt accuracy in our tests | | Light theme, clean digital render | defaults | high-contrast digital text is the easy case | | Dense lines (documents, tables) | V6_SMALL_MODEL; cross-line for max throughput | full dictionary and a more sensitive detector for small body text | | Sparse short labels (UI, forms) | defaults | merged line context fixes short-label misreads that isolated crops produce | | Tiny text / fine print | "auto" (default) scales the cap with input size | text below ~10px after downscale stops being detected; override with a fixed maxSideLength only if it persists | | Landscape / wide pages | defaults; the cap applies to the longest side | "auto" keeps mid-size pages near native scale instead of downscaling sooner | | Low contrast | V6_MEDIUM_MODEL if defaults fall short | weak probability responses need more pixels or a stronger model | | Photo (camera, uneven lighting) | defaults (99.5% on the receipt); medium for the hardest shots | decode refinements beat small (97.4%) here; "auto" sizing keeps large photos near native scale | | Tilted / rotated | deskew first with ppu-ocv DeskewService | every model tier degrades sharply past a few degrees of skew | | One known language | the matching v5 model | single-language dictionary, no cross-language confusion |

Image Preprocessing

PaddleOCR works best with grayscale or thresholded images. Use ppu-ocv for preprocessing before recognition:

import { ImageProcessor, CanvasProcessor } from "ppu-ocv";
const processor = new ImageProcessor(bodyCanvas);

// For non-OpenCV environments (e.g. browser extensions)
// const processor = new CanvasProcessor(bodyCanvas)

processor.grayscale().blur();
const canvas = processor.toCanvas();
processor.destroy();

Processing Engine

Two image processing backends are available for detection preprocessing and recognition resizing:

| Engine | Default | OpenCV Required | Notes | | :---------------- | :-----: | :-------------: | :-------------------------------------------------- | | "opencv" | Yes | Yes | Uses OpenCV.js from ppu-ocv. More accurate boxes. | | "canvas-native" | No | No | Pure canvas from ppu-ocv/canvas. Lighter weight. |

The browser build (ppu-paddle-ocr/web) always uses canvas-native, OpenCV.js is not bundled in the web entry point.

// OpenCV (default, recommended)
const service = new PaddleOcrService();

// Canvas-native (no OpenCV dependency)
const service = new PaddleOcrService({
  processing: { engine: "canvas-native" },
});

Web / Browser Support

Import from ppu-paddle-ocr/web for browser-native capabilities (HTMLCanvasElement, OffscreenCanvas, fetch buffering).

Using a Bundler (Vite, Webpack, etc.)

import { PaddleOcrService } from "ppu-paddle-ocr/web";

const service = new PaddleOcrService();
await service.initialize();

const file = document.getElementById("upload").files[0];

const img = new Image();
img.src = URL.createObjectURL(file);
await new Promise((r) => (img.onload = r));

const canvas = document.createElement("canvas");
canvas.width = img.width;
canvas.height = img.height;
canvas.getContext("2d").drawImage(img, 0, 0);

const result = await service.recognize(canvas);
console.log(result.text);

CDN (No Bundler)

See the live demo for a complete ESM/CDN setup.

WebGPU Acceleration

On WebGPU-capable browsers (Chrome/Edge on Windows/Linux/macOS, Firefox Nightly), ONNX inference automatically runs on the GPU, typically 2-5x faster with no code changes. The library silently falls back to WASM if WebGPU is unavailable or fails.

Detection runs once during initialize() and is fully transparent.

import { isWebGpuAvailable, getDefaultWebExecutionProviders } from "ppu-paddle-ocr/web";

if (await isWebGpuAvailable()) {
  console.log("WebGPU supported");
}

Override Provider Preference

// Force WASM-only
const service = new PaddleOcrService({
  session: {
    executionProviders: ["wasm"],
    graphOptimizationLevel: "all",
  },
});

The WASM binaries are still required even when WebGPU is the primary provider (used for graph optimization and fallback ops). Set ort.env.wasm.wasmPaths before initialize() if you self-host them.

Multithreaded WASM (Cross-Origin Isolation)

When the WASM backend is used (no WebGPU, or executionProviders: ["wasm"]), ONNX Runtime only runs multithreaded if the page is cross-origin isolated, otherwise numThreads is pinned to 1. Cross-origin isolation requires the Cross-Origin-Opener-Policy: same-origin and Cross-Origin-Embedder-Policy: require-corp response headers.

If you can set those headers server-side, do that, it's the correct fix and needs nothing from this package. WebGPU does not need isolation at all, so this only matters on the WASM fallback path.

For static hosts that can't set headers (e.g. GitHub Pages), the package ships an opt-in coi-serviceworker that injects the headers client-side. Copy it to your served root and load it from your page before anything else:

<script src="/coi-serviceworker.js"></script>

Resolve the shipped copy from the package, e.g. in a build step:

// path on disk: node_modules/ppu-paddle-ocr/coi-serviceworker.js
const swPath = import.meta.resolve("ppu-paddle-ocr/coi-serviceworker.js");

The service worker reloads the page once on first visit to apply the headers and rewrites all fetch responses. Don't use it if you already control your headers or run another service worker that conflicts.

React Native (Mobile)

Run the same OCR pipeline on iOS and Android via the ppu-paddle-ocr/mobile entry. It uses onnxruntime-react-native (native JSI inference) and ppu-ocv/canvas-mobile (Skia-backed canvas) instead of their web counterparts.

npm install ppu-paddle-ocr onnxruntime-react-native @shopify/react-native-skia

import { PaddleOcrService } from "ppu-paddle-ocr/mobile";

const service = new PaddleOcrService();
await service.initialize();

// `imageBuffer` is an ArrayBuffer e.g. from a captured frame or a bundled asset.
const result = await service.recognize(imageBuffer, { flatten: true });
console.log(result.text);

await service.destroy();

Notes:

Native modules required. Both onnxruntime-react-native and @shopify/react-native-skia ship native code, so you need a dev client or expo prebuild, Expo Go is not supported. Targets RN >= 0.74 / Expo SDK >= 51 (Hermes).
CPU inference. Mobile runs on CPU by default; pass session: { executionProviders: ["nnapi"] } (Android) or ["coreml"] (iOS) to opt into hardware acceleration. There is no WebGPU on React Native.
Camera capture is out of scope. Pass a decoded frame from react-native-vision-camera or expo-camera as an ArrayBuffer.
A runnable Expo example lives in a separate repo: ppu-paddle-ocr-mobile-react-native-demo.

Models and Language Support

PP-OCRv6 Models

PP-OCRv6 ships a single unified model family covering 50+ languages (Simplified/Traditional Chinese, English, Japanese, 46+ Latin-script languages, Arabic, Indic, ...), no per-language model files needed. The package default is the tiny tier; which tier fits which workload is covered in Choosing a Model and Configuration.

The default resolves to these files, downloaded and cached on first run:

Portable .onnx variants are available at ppu-paddle-ocr-models, point model.detection / model.recognition at the .onnx URLs.

Quick switching with presets:

import {
  PaddleOcrService,
  V6_SMALL_MODEL,
  V6_MEDIUM_MODEL,
  V6_TINY_MODEL,
  V5_EN_MOBILE_MODEL,
} from "ppu-paddle-ocr";

// Default (v6 tiny) - same as passing no model option
const service = new PaddleOcrService({ model: V6_TINY_MODEL });

// Full dictionary
const serviceFull = new PaddleOcrService({ model: V6_SMALL_MODEL });

// Server-grade
const serviceServer = new PaddleOcrService({ model: V6_MEDIUM_MODEL });

// Pre-6.0.0 default (PP-OCRv5 English mobile)
const v5 = new PaddleOcrService({ model: V5_EN_MOBILE_MODEL });

Cache Location (Node / Bun)

Models are cached under ~/.cache/ppu-paddle-ocr:

// Warm the cache (e.g. in CI or Docker builds)
PaddleOcrService.downloadModels();

// Clear the cache
service.clearModelCache();

In the browser, model files are fetched via fetch() on every page load and rely on the browser's HTTP cache. For persistent offline caching, use a Service Worker or store the ArrayBuffer in IndexedDB.

Multilingual Support

PP-OCRv5 supports 40+ languages across different script systems. Pre-converted ONNX models are available at ppu-paddle-ocr-models:

Latin: English, French, German, Italian, Spanish, Portuguese, and 40+ others
Cyrillic: Russian, Ukrainian, Bulgarian, Kazakh, Serbian, and 30+ related
Arabic: Arabic, Persian, Urdu, Kurdish
Indic: Hindi (Devanagari), Tamil, Telugu
East Asian: Korean, Japanese
Southeast Asian: Thai

Switching Languages

Using presets (easiest):

import { PaddleOcrService, V5_THAI_MOBILE_MODEL, V5_ARABIC_MOBILE_MODEL } from "ppu-paddle-ocr";

// Thai
const service = new PaddleOcrService({ model: V5_THAI_MOBILE_MODEL });

// Arabic
const service = new PaddleOcrService({ model: V5_ARABIC_MOBILE_MODEL });

Manual URLs (advanced):

const MODEL_BASE =
  "https://media.githubusercontent.com/media/PT-Perkasa-Pilar-Utama/ppu-paddle-ocr-models/refs/heads/main";
const DICT_BASE =
  "https://raw.githubusercontent.com/PT-Perkasa-Pilar-Utama/ppu-paddle-ocr-models/refs/heads/main";

// Thai
const service = new PaddleOcrService({
  model: {
    detection: `${MODEL_BASE}/detection/PP-OCRv5_mobile_det_infer.onnx`,
    recognition: `${MODEL_BASE}/recognition/multi/th/v5/th_PP-OCRv5_mobile_rec_infer.onnx`,
    charactersDictionary: `${DICT_BASE}/recognition/multi/th/v5/ppocrv5_th_dict.txt`,
  },
});

Server Models (Higher Accuracy)

Using presets:

import { PaddleOcrService, V5_EN_SERVER_MODEL, V5_SERVER_MODEL } from "ppu-paddle-ocr";

// PP-OCRv5 English server
const service = new PaddleOcrService({ model: V5_EN_SERVER_MODEL });

// PP-OCRv5 server (multilingual)
const service = new PaddleOcrService({ model: V5_SERVER_MODEL });

Manual configuration:

const service = new PaddleOcrService({
  model: {
    detection: `${MODEL_BASE}/detection/PP-OCRv5_server_det_infer.onnx`,
    recognition: `${MODEL_BASE}/recognition/PP-OCRv5_server_rec_infer.onnx`,
    charactersDictionary: `${DICT_BASE}/recognition/ppocrv5_dict.txt`,
  },
});

INT8 Quantization

The recognition model's transformer MatMul operations can be dynamically quantized to INT8 with no accuracy loss (measured 99.22% -> 99.22%) and a 20-50% speedup on x86-64 CPUs with VNNI and WebAssembly.

On Apple Silicon (M-series), INT8 is not faster, the FP32 NEON/Accelerate kernels outperform the INT8 MLAS path. Stick with FP32 on macOS ARM64.

Using the preset:

import { PaddleOcrService, V5_EN_MOBILE_INT8_MODEL } from "ppu-paddle-ocr";

const service = new PaddleOcrService({ model: V5_EN_MOBILE_INT8_MODEL });

Custom quantization, run the quantization helper:

pip install onnxruntime onnx sympy
python examples/quantize-onnx.py /path/to/en_PP-OCRv5_mobile_rec_infer.onnx
# -> produces en_PP-OCRv5_mobile_rec_infer_int8.onnx

Use the quantized model via model.recognition:

const service = new PaddleOcrService({
  model: {
    recognition: "https://example.com/en_PP-OCRv5_mobile_rec_infer_int8.onnx",
  },
});

INT8 .ort variants are also available in the ppu-paddle-ocr-models repo.

Model Output Limitations

Tables: Text within table cells is detected, but table structure is not preserved.
Math formulas: Not optimized for mathematical notation.
Document layout: For layout detection, see PP-DocLayoutV2/V3 models in ppu-paddle-ocr-models.

Converting Custom PaddlePaddle Models

See the ONNX conversion guide.

Configuration Reference

`PaddleOptions`

import type { PaddleOptions } from "ppu-paddle-ocr";

export type PaddleOptions = {
  model?: ModelPathOptions;
  detection?: DetectionOptions;
  recognition?: RecognitionOptions;
  debugging?: DebuggingOptions;
  session?: SessionOptions;
  processing?: ProcessingOptions;
};

`RecognizeOptions`

Per-call options for recognize().

`DetectOptions`

Per-call options for detect(). Extends DetectionOptions, so every detection tuning field (maxSideLength, minimumAreaThreshold, paddingVertical, paddingHorizontal, mean, stdDeviation) can also be overridden for a single call, plus:

`BatchRecognizeOptions`

Extends RecognizeOptions (applied to every image) for batchRecognize() / batchRecognizeStream().

| Property | Type | Default | Description | | :------------ | :----------------------: | :------: | :--------------------------------------------------------------------------------------- | | concurrency | number \| "auto" | "auto" | Max images in flight. "auto" = 1 on an accelerator provider, small default on CPU. | | settle | boolean | false | When true, a failed image yields { status: "rejected", reason } instead of throwing. | | signal | AbortSignal | null | Cancels the batch; pending images are not scheduled and the call rejects. | | onProgress | (done, total?) => void | null | Called after each image settles, with the running count and total (if known). |

`ModelPathOptions`

Leave a trailing newline in your dictionary file.

`DetectionOptions`

Controls preprocessing and filtering during text detection.

`RecognitionOptions`

Controls recognition preprocessing and strategy.

`DebuggingOptions`

`SessionOptions`

Any valid ONNX Runtime InferenceSession.SessionOptions property is accepted. ppu-paddle-ocr sets these defaults:

const service = new PaddleOcrService({
  session: {
    executionProviders: ["cpu"],
    graphOptimizationLevel: "all",
    enableCpuMemArena: true,
    enableMemPattern: true,
    executionMode: "sequential",
  },
});

`ProcessingOptions`

Benchmark

Benches use a small zero-dependency harness (bench/harness.ts): in-process timing, round-robin scheduling across rounds so thermal/GC drift hits every task equally, reporting the median plus min/max/stddev. Run bun task bench.

Representative results on Apple M1 / Bun 1.3.14 (20 rounds, opencv + canvas-native) at the shipped defaults (PP-OCRv6 tiny, "auto" sizing, minimumConfidence: 0.5, decode refinements):

task                                   median      +/-stddev        min        max
--------------------------------------------------------------------------------
[per-box][opencv][noCache]             139.7 ms      18.3 ms   135.3 ms   194.3 ms
[per-line][opencv][noCache]            138.8 ms      10.9 ms   132.9 ms   171.3 ms
[cross-line][opencv][noCache]          139.8 ms       8.0 ms   135.5 ms   168.6 ms
[per-box][canvas-native][noCache]      160.2 ms       7.9 ms   156.6 ms   192.4 ms
[per-line][canvas-native][noCache]     154.6 ms      10.3 ms   150.2 ms   188.0 ms
[cross-line][canvas-native][noCache]   161.9 ms      13.1 ms   156.2 ms   211.4 ms

=== Accuracy on receipt.jpg (ground truth: 383 chars) ===
  [opencv]        per-box=99.48%  per-line=99.48%  cross-line=94.26%
  [canvas-native] per-box=99.48%  per-line=99.22%  cross-line=94.78%

Batch vs. concurrent `recognize()`

bench/batch.bench.ts compares the ways to OCR many images, tracking peak RSS alongside time. Captured on the previous v5 default (the relative comparison between sequential / Promise.all / batchRecognize is model-independent), median over 7 rounds of 16 images each, Apple M1 / Bun 1.3.14, opencv, noCache:

task                          median      +/-stddev        min        max   peak RSS
----------------------------------------------------------------------------------
sequential for-loop          3802.5 ms     300.6 ms  3169.4 ms  3979.7 ms    1059 MB
Promise.all(map(recognize))  3543.5 ms     254.0 ms  3030.0 ms  3768.0 ms    1428 MB
batchRecognize (auto)        3676.1 ms     200.9 ms  3217.1 ms  3761.3 ms    1096 MB
batchRecognize (c=4)         3653.8 ms     239.1 ms  3170.1 ms  3804.1 ms    1027 MB
batchRecognize (c=8)         3605.7 ms     187.6 ms  3202.1 ms  3786.6 ms    1096 MB

On CPU, throughput is bound by ONNX Runtime's native thread pool (which already saturates all cores per inference), so every parallel approach lands within ~4% on time, JS-level concurrency cannot add cores that are already busy.

The real difference is memory: unbounded Promise.all peaks at ~1430 MB and grows with batch size, while batchRecognize stays bounded at ~1030-1100 MB regardless of N.

So batchRecognize matches the fastest approach at lower, bounded peak memory, and the throughput win from concurrency shows up on GPU (overlapping host<->device) or I/O-bound inputs. Tune BATCH_N / ROUNDS via env.

Ecosystem

ppu-paddle-ocr is part of a family of document-processing libraries for JavaScript runtimes, all from PT Perkasa Pilar Utama:

| Library | What it does | | :------------------------------------------------------------------------------------------- | :-------------------------------------------------------------------------------------------------------------- | | ppu-ocv | Chainable image processing on OpenCV.js, plus canvas utilities that run in Node, Bun, browsers, and extensions. | | ppu-pdf | PDF text extraction (digital and scanned) with coordinates, line grouping, and page-to-canvas/PNG rendering. | | ppu-doclayout | Document layout analysis with PaddlePaddle PP-DocLayout (tables, figures, text regions). | | ppu-doc-correction | Document image correction: page orientation, geometric unwarping (UVDoc), and text-line orientation. | | ppu-uniface | Face detection, recognition, verification, alignment, and anti-spoofing (a port of Python's Uniface). | | ppu-yolo-onnx-inference | YOLOv11 object detection in Bun/Node and browsers; no Python or PyTorch required. |

Contributing

See CONTRIBUTING.md for setup instructions, code-quality requirements, and the pull request process.

License

MIT. See LICENSE.

Support

Open an issue or join our Slack community.

Scripts

Recommended development environment is Linux-based. Library template: https://github.com/aquapi/lib-template

To run a specific benchmark file:

bun task bench index     # Run bench/index.bench.ts
bun task bench --node    # Run all benchmarks with Node.js

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

ppu-paddle-ocr

Table of Contents

Why ppu-paddle-ocr?

Runtime Support

Installation

CLI (global install)

Core Usage

Basic Recognition

Custom Models

Changing Models at Runtime

Per-Call Options

Detection Only

Command Line

Batch Recognition

Recognition Strategies

Choosing a Model and Configuration

Model families

Input characteristics

Image Preprocessing

Processing Engine

Web / Browser Support

Using a Bundler (Vite, Webpack, etc.)

CDN (No Bundler)

WebGPU Acceleration

Override Provider Preference

Multithreaded WASM (Cross-Origin Isolation)

React Native (Mobile)

Models and Language Support

PP-OCRv6 Models

Cache Location (Node / Bun)

Multilingual Support

Switching Languages

Server Models (Higher Accuracy)

INT8 Quantization

Model Output Limitations

Converting Custom PaddlePaddle Models

Configuration Reference

PaddleOptions

RecognizeOptions

DetectOptions

BatchRecognizeOptions

ModelPathOptions

DetectionOptions

RecognitionOptions

DebuggingOptions

SessionOptions

ProcessingOptions

Benchmark

Batch vs. concurrent recognize()

Ecosystem

Contributing

License

Support

Scripts

`PaddleOptions`

`RecognizeOptions`

`DetectOptions`

`BatchRecognizeOptions`

`ModelPathOptions`

`DetectionOptions`

`RecognitionOptions`

`DebuggingOptions`

`SessionOptions`

`ProcessingOptions`

Batch vs. concurrent `recognize()`