paperkit

v0.3.0

Published

17 days ago

Cross-runtime document processing toolkit: denoise, OCR, layout, and export for web, React Native, and Node.

Downloads

142

0High
0Medium
0Low

typescriptify

ocr document denoise image-processing onnx react-native browser cross-platform paperkit

paperkit

Cross-runtime document processing toolkit. Take a photo of a document → get a clean image, OCR text, structured document tree, searchable PDF, or JSON. Same API on web, React Native (Expo), and Node.js.

import { parseDocument, toMarkdown, backend } from "paperkit";

const doc = await parseDocument(photo, backend, {
  layout:  { model: { url: "/models/doclayout.onnx" }, classNames: DEFAULT_DOCLAYOUT_CLASS_NAMES },
  text:    { model: { url: "/models/ppocr-rec.onnx" }, charset: [...ppocrKeys, " "] },
  formula: formulaRecognizer,   // optional — math → LaTeX
  table:   tableRecognizer,     // optional — tables → HTML
});

console.log(toMarkdown(doc));

// Or use one-shot OCR when you don't need the layout tree:
import { ocr, backend } from "paperkit";

const result = await ocr(photo, backend, {
  detection:   { model: { url: "/models/ppocr-det.onnx" } },
  recognition: { model: { url: "/models/ppocr-rec.onnx" }, charset: [...ppocrKeys, " "] },
});

Why paperkit

One API, three runtimes. Import from "paperkit" anywhere — the bundler picks the right build via conditional exports. No separate packages to keep in sync.
Bring your own model. We never bundle weights. Point model.url / model.path at ONNX files you host. Recommended models and direct download URLs are listed below and in each per-feature doc.
Pay only for what you use. ONNX Runtime is an optional peer dep. If you only use classical features (page detection, binarization, perspective, deskew, blur, keyword classification, script detection, rule KIE), your bundle stays tiny.
Tree-shakeable. Importing denoise doesn't pull in OCR code.
TypeScript-first. Full types, discriminated unions, no any.
100 % statement / line coverage on the library core across 257 tests.

Install

npm install paperkit

# + whichever ONNX runtime you need (optional peer dep)
npm install onnxruntime-web                  # browsers
npx expo install onnxruntime-react-native    # Expo / React Native
npm install onnxruntime-node                 # Node.js / Electron main

# Optional peer deps for specific features:
npm install sharp                            # Node image I/O (decode / encode)
npm install pdfjs-dist @napi-rs/canvas       # rasterizePDF on Node
npm install pdfjs-dist                       # rasterizePDF on the web

Expo specifics

onnxruntime-react-native is a native module — you cannot use it in Expo Go. Use a development build or EAS Build:

npx expo install expo-dev-client
npx expo prebuild

What's in the box

Every feature below ships in this release. Features marked "No" under Model? run with zero external downloads — pure TypeScript.

Geometry — camera-photo clean-up

| API | Does what | Model? | Docs | |---|---|:---:|---| | applyExifRotation / exifOrientationFromBytes / rotateByExif | Apply JPEG EXIF Orientation | No | geometry.md | | detectPage | Find page corners (Otsu + largest connected component + convex hull + diagonal-extremes) | No | geometry.md | | correctPerspective | Warp a 4-corner quad to a rectangle (DLT homography + bilinear) | No | geometry.md | | deskew / estimateSkewAngle | Remove rotational skew via projection-profile variance | No | geometry.md | | dewarp / createDewarper | Flatten curved / folded pages via UV-grid model | Yes — UVDoc | geometry.md |

Appearance — pixel clean-up

| API | Does what | Model? | Docs | |---|---|:---:|---| | denoise / denoiseRaw / createDenoiser | Tiled ML denoising over any same-shape ONNX (Restormer, NAFNet, Swin2SR, …) | Yes — NAFNet / others | appearance.md | | binarize | Adaptive threshold (Gaussian-mean or Sauvola) via integral image | No | appearance.md | | removeShadow | Divide-by-blurred-background illumination correction | No | appearance.md |

OCR — printed text, handwriting, formulas, tables

| API | Does what | Model? | Docs | |---|---|:---:|---| | detectText / createDetector | DB-based text-region detection with unclip postprocess | Yes — PP-OCRv4 det | ocr.md | | recognizeText / createRecognizer / ctcGreedyDecode | CRNN recognition with auto pre/post-softmax CTC decode | Yes — PP-OCRv4 rec | ocr.md | | ocr / createOcrPipeline | Full image → text with reading-order sort + per-region progress | Yes — PP-OCRv4 | ocr.md | | recognizeHandwriting / createHandwritingRecognizer | English handwriting via TrOCR vision-encoder-decoder | Yes — TrOCR | ocr.md | | recognizeFormula / createFormulaRecognizer | Math → LaTeX via TexTeller vision-encoder-decoder | Yes — TexTeller | ocr.md | | recognizeTable / createTableRecognizer | Tables → HTML with cell text + colspan / rowspan | Yes — SLANet-plus | ocr.md | | recognizeVisionEncoderDecoder / createVisionEncoderDecoderRecognizer | Generic VisionEncoderDecoder runner — any encoder-decoder ONNX | Yes — user-supplied | ocr.md | | createTokenDecoder / createUnigramMetaspaceDecoder / createByteBpeDecoder | Tokenizer decoders for HF tokenizer.json; auto-dispatch on model.type | No | ocr.md |

Layout — typed-region dispatcher

| API | Does what | Model? | Docs | |---|---|:---:|---| | analyzeLayout / createLayoutAnalyzer | Typed regions (title, text, list, table, formula, figure) with bboxes + confidence | Yes — DocLayout-YOLO | layout.md | | parseDocument / createDocumentPipeline | Full document parse — layout + per-region recognition + reading-order sort | Yes — layout + recognizers | layout.md | | DEFAULT_DOCLAYOUT_CLASS_NAMES / DEFAULT_DOCLAYOUT_MAPPING | 10 raw labels + their canonical mapping | — | layout.md |

Input — PDF rasterization

| API | Does what | Model? | Docs | |---|---|:---:|---| | rasterizePDF / rasterizePdfWith / installRasterizePDF | Render each PDF page to RawImage via pdfjs-dist + canvas | No (peer deps: pdfjs-dist + @napi-rs/canvas on Node) | input.md |

React Native isn't supported — pdfjs has DOM / worker assumptions. Use react-native-pdf or a native PDF lib to rasterize off-JS, then feed the resulting RawImage[] into the rest of paperkit.

Export — turn results into files

| API | Does what | Model? | Docs | |---|---|:---:|---| | toSearchablePDF | Multi-page (or single-page) searchable PDF with invisible text overlay | No | export.md | | toMarkdown | DocumentResult → structured Markdown (headings, lists, $$math$$, inline <table>, figure captions) or flat line-per-region from OcrResult | No | export.md | | toJSON | OcrResult → stable persistence-ready schema | No | export.md |

Quality — pre-flight checks

| API | Does what | Model? | Docs | |---|---|:---:|---| | estimateBlur | Variance-of-Laplacian focus score | No | quality.md |

Classification — what kind of document is this?

| API | Does what | Model? | Docs | |---|---|:---:|---| | classifyByKeywords | Zero-model OCR-text classifier (8 default categories, customizable) | No | classify.md | | classifyDocument / createDocumentClassifier | Image classifier over any ONNX [1, 3, H, W] → [1, C] model | Yes — DiT-RVLCDIP | classify.md | | DEFAULT_RVLCDIP_LABELS | 16-class RVL-CDIP taxonomy | — | classify.md |

Text analysis

| API | Does what | Model? | Docs | |---|---|:---:|---| | detectScript | Dominant Unicode script (11 scripts: latin, han, hiragana, katakana, hangul, cyrillic, arabic, hebrew, thai, devanagari, greek) | No | text.md |

KIE — key-information extraction

| API | Does what | Model? | Docs | |---|---|:---:|---| | extractByRules / createRuleBasedExtractor | Pattern + keyword extractor with word-boundary matching and typed coercion | No | kie.md | | INVOICE_SCHEMA / RECEIPT_SCHEMA / ID_CARD_SCHEMA | Ready-to-use schemas for common document types | — | kie.md | | KieExtractor interface | Contract that rule-based, VLM, and future LayoutLM backends all satisfy | — (user-supplied when VLM) | kie.md |

Batch / progress / workers

| API | Does what | Model? | Docs | |---|---|:---:|---| | batchMap | Generic concurrency-limited mapper (order-preserving, fail-fast) | No | batch.md | | onProgress | Accepted by denoise, ocr, parseDocument, recognizeTable, rasterizePDF, batchMap | No | batch.md | | Worker patterns (browser + Node) | Result types are plain-data POD; all handles stay inside the worker that created them | No — documented patterns, no RPC wrapper | workers.md |

Recommended models

Every feature that needs ML takes a model.url (web) or model.path (Node / RN). Here are the tested ONNX weights that paperkit validates against — point at these to get the same behavior as the smoke scripts.

Models are never bundled with paperkit. You download them once from the canonical source (usually HuggingFace or ModelScope), host or ship them with your app, and pass the URL / path to the relevant factory.

PP-OCRv4 — printed-text OCR

mkdir -p models/ppocr

# Text detection (multilingual, ~4.7 MB)
curl -L -o models/ppocr/det.onnx \
  "https://huggingface.co/SWHL/RapidOCR/resolve/main/PP-OCRv4/ch_PP-OCRv4_det_infer.onnx"

# Text recognition (Chinese + English, ~10 MB)
curl -L -o models/ppocr/rec.onnx \
  "https://huggingface.co/SWHL/RapidOCR/resolve/main/PP-OCRv4/ch_PP-OCRv4_rec_infer.onnx"

# Character dictionary (~6,600 chars; paperkit appends a trailing space)
curl -L -o models/ppocr/keys.txt \
  "https://raw.githubusercontent.com/PaddlePaddle/PaddleOCR/release/2.7/ppocr/utils/ppocr_keys_v1.txt"

Usage:

import { ocr, backend } from "paperkit";
import { promises as fs } from "node:fs";

const keys = (await fs.readFile("models/ppocr/keys.txt", "utf8")).split("\n").filter(Boolean);
const charset = [...keys, " "];   // IMPORTANT — trailing space isn't in the downloaded file

const result = await ocr(photoBytes, backend, {
  detection:   { model: { path: "models/ppocr/det.onnx" } },
  recognition: { model: { path: "models/ppocr/rec.onnx" }, charset },
});

Higher-accuracy server variants and language-specific alternatives (English, Korean, Arabic, Hindi, etc.) are listed in docs/features/ocr.md.

DocLayout-YOLO — layout (75 MB)

mkdir -p models/layout
curl -L -o models/layout/doclayout.onnx \
  "https://huggingface.co/wybxc/DocLayout-YOLO-DocStructBench-onnx/resolve/main/doclayout_yolo_docstructbench_imgsz1024.onnx"

import { parseDocument, DEFAULT_DOCLAYOUT_CLASS_NAMES, backend } from "paperkit";

const doc = await parseDocument(photoBytes, backend, {
  layout: { model: { path: "models/layout/doclayout.onnx" }, classNames: DEFAULT_DOCLAYOUT_CLASS_NAMES },
  text:   { model: { path: "models/ppocr/rec.onnx" }, charset },
});

TrOCR — handwriting (~64 MB, int8)

mkdir -p models/handwriting
BASE="https://huggingface.co/Xenova/trocr-small-handwritten/resolve/main"
curl -L -o models/handwriting/encoder.onnx   "$BASE/onnx/encoder_model_quantized.onnx"   # ~22 MB
curl -L -o models/handwriting/decoder.onnx   "$BASE/onnx/decoder_model_quantized.onnx"   # ~38 MB
curl -L -o models/handwriting/tokenizer.json "$BASE/tokenizer.json"                       # ~4 MB

import { createHandwritingRecognizer, createTokenDecoder, backend } from "paperkit";
import { promises as fs } from "node:fs";

const tokenizerJson = JSON.parse(await fs.readFile("models/handwriting/tokenizer.json", "utf8"));
const handwriting = await createHandwritingRecognizer(backend, {
  encoder: { path: "models/handwriting/encoder.onnx" },
  decoder: { path: "models/handwriting/decoder.onnx" },
  decodeTokens: createTokenDecoder(tokenizerJson),
  maxLength: 64,
});

// Plug into parseDocument:
await parseDocument(bytes, backend, { layout, text: handwriting });

English only. Larger fp16 / fp32 variants (higher quality, bigger files) are listed in docs/features/ocr.md.

TexTeller — formula recognition (~303 MB int8)

mkdir -p models/formula
BASE="https://huggingface.co/onnx-community/TexTeller-ONNX/resolve/main"
curl -L -o models/formula/encoder.onnx   "$BASE/onnx/encoder_model_int8.onnx"   # ~84 MB
curl -L -o models/formula/decoder.onnx   "$BASE/onnx/decoder_model_int8.onnx"   # ~218 MB
curl -L -o models/formula/tokenizer.json "$BASE/tokenizer.json"                  # ~1.3 MB

import { createFormulaRecognizer, createTokenDecoder, backend } from "paperkit";

const tokenizerJson = JSON.parse(await fs.readFile("models/formula/tokenizer.json", "utf8"));
const formulaRec = await createFormulaRecognizer(backend, {
  encoder: { path: "models/formula/encoder.onnx" },
  decoder: { path: "models/formula/decoder.onnx" },
  decodeTokens: createTokenDecoder(tokenizerJson),
});

await parseDocument(bytes, backend, { layout, text, formula: formulaRec });

Smaller q4f16 (~200 MB) and larger fp16 / fp32 variants in docs/features/ocr.md.

SLANet-plus — table recognition (~7.4 MB)

mkdir -p models/table
curl -L -o models/table/slanet-plus.onnx \
  "https://www.modelscope.cn/models/RapidAI/RapidTable/resolve/v2.0.0/slanet-plus.onnx"

import { createTableRecognizer, createRecognizer, backend } from "paperkit";

const cellTextRecognizer = await createRecognizer(backend, {
  model: { path: "models/ppocr/rec.onnx" }, charset,
});
const tableRec = await createTableRecognizer(backend, {
  model: { path: "models/table/slanet-plus.onnx" },
  cellTextRecognizer,
});

await parseDocument(bytes, backend, { layout, text: cellTextRecognizer, table: tableRec });

UVDoc — dewarping (~30 MB)

mkdir -p models/dewarp
BASE="https://huggingface.co/fredcallagan/uvdoc-grid-onnx/resolve/main"
# Both files required — the .onnx references the .onnx.data externally.
curl -L -o models/dewarp/UVDoc_grid.onnx      "$BASE/UVDoc_grid.onnx"       # 237 KB
curl -L -o models/dewarp/UVDoc_grid.onnx.data "$BASE/UVDoc_grid.onnx.data"  #  30 MB

import { dewarp, backend } from "paperkit";

const flat = await dewarp(photoImage, backend, {
  model: { path: "models/dewarp/UVDoc_grid.onnx" },
});

NAFNet — denoise / deblur (~91 MB)

mkdir -p models/denoise
curl -L -o models/denoise/nafnet.onnx \
  "https://huggingface.co/opencv/deblurring_nafnet/resolve/main/deblurring_nafnet_2025may.onnx"

import { denoise, backend } from "paperkit";

const clean = await denoise(photoBytes, backend, {
  model:      { path: "models/denoise/nafnet.onnx" },
  inputName:  "lq",           // NAFNet input tensor
  outputName: "output",
  tileSize:   512,            // NAFNet SCA module requires ≥ 384 per side
  overlap:    64,
  normalize:  { scale: 1 / 255 },
});

Other denoisers (Restormer, Swin2SR, NAFNet SIDD) and PyTorch → ONNX export instructions in docs/features/appearance.md.

DiT — document classification (~83 MB int8)

mkdir -p models/classify
curl -L -o models/classify/dit-rvlcdip.onnx \
  "https://huggingface.co/Xenova/dit-base-finetuned-rvlcdip/resolve/main/onnx/model_quantized.onnx"

import { classifyDocument, DEFAULT_RVLCDIP_LABELS, backend } from "paperkit";

const { category, confidence } = await classifyDocument(image, backend, {
  model:  { path: "models/classify/dit-rvlcdip.onnx" },
  labels: DEFAULT_RVLCDIP_LABELS,
  topK:   3,
});

Size / precision variants (fp16, q4f16, fp32) in docs/features/classify.md.

Usage patterns

One-shot helpers

Easiest for small scripts — model loads and disposes per call:

import { denoise, backend } from "paperkit";

const clean = await denoise(file, backend, { model: { url: "/models/nafnet.onnx" } });

Reusable pipelines (keep models loaded)

Preferred in apps where you process many images:

import { createDenoiser, createOcrPipeline, backend } from "paperkit";

const denoiser = await createDenoiser(backend, { model: { url: "/models/nafnet.onnx" } });
const ocrPipe  = await createOcrPipeline(backend, { detection, recognition });

for (const photo of photos) {
  const clean  = await denoiser.denoise(photo);
  const result = await ocrPipe.runRaw(clean);
}

await denoiser.dispose();
await ocrPipe.dispose();

Full phone-photo pipeline — no ML required for the classical part

import {
  applyExifRotation, detectPage, correctPerspective, deskew,
  binarize, removeShadow,
  createOcrPipeline, toSearchablePDF,
  backend,
} from "paperkit";

const raw     = await applyExifRotation(photo, backend);
const quad    = detectPage(raw);
const flat    = quad ? correctPerspective(raw, quad) : raw;
const upright = deskew(flat);
const lit     = removeShadow(upright);
const bw      = binarize(lit);

const pipe    = await createOcrPipeline(backend, { detection, recognition });
const result  = await pipe.runRaw(bw);

const jpeg    = await backend.encodeImage(bw, "jpeg", 85);
const pdfBytes = await toSearchablePDF({
  imageBytes: jpeg, imageWidth: bw.width, imageHeight: bw.height, ocr: result,
});

See examples/node/scan.ts for the runnable version of this pipeline (zero models needed).

Full document parse (layout + per-region recognition)

import {
  parseDocument, toMarkdown,
  createFormulaRecognizer, createTableRecognizer,
  createHandwritingRecognizer, createTokenDecoder,
  DEFAULT_DOCLAYOUT_CLASS_NAMES,
  backend,
} from "paperkit";

const formulaRec = await createFormulaRecognizer(backend, { encoder, decoder, decodeTokens: createTokenDecoder(formulaTokenizer) });
const tableRec   = await createTableRecognizer(backend,   { model: { path: "models/table/slanet-plus.onnx" }, cellTextRecognizer });

const doc = await parseDocument(photoBytes, backend, {
  layout:  { model: { path: "models/layout/doclayout.onnx" }, classNames: DEFAULT_DOCLAYOUT_CLASS_NAMES },
  text:    { model: { path: "models/ppocr/rec.onnx" }, charset: [...keys, " "] },
  formula: formulaRec,
  table:   tableRec,
});

console.log(toMarkdown(doc));
//  # Title
//
//  Paragraph body text.
//
//  $$ \psi_0(M) = \int … $$
//
//  <table><tr><td colspan="2">A</td>…</table>

Batch over many images

import { batchMap, createOcrPipeline, backend } from "paperkit";

const pipe = await createOcrPipeline(backend, { detection, recognition });

const results = await batchMap(
  photos,
  (bytes) => pipe.run(bytes),
  { concurrency: 2, onProgress: (e) => console.log(`${e.current}/${e.total}`) },
);

await pipe.dispose();

Runtime-specific notes

Web:

import { denoise, backend } from "paperkit";

async function handleFile(file: File) {
  const clean  = await denoise(file, backend, { model: { url: "/models/nafnet.onnx" } });
  const bytes  = await backend.encodeImage(clean, "png");
  return new Blob([bytes], { type: "image/png" });
}

Node:

import { promises as fs } from "node:fs";
import { denoise, backend } from "paperkit";

const clean = await denoise(await fs.readFile("photo.jpg"), backend, {
  model: { path: "./models/nafnet.onnx" },
});

React Native (Expo development build):

import { FileSystem } from "expo-file-system";
import { denoiseRaw, backend } from "paperkit";

// Decode the image in your app — the native adapter doesn't bundle an image codec.
const raw = /* your decode-to-RGBA helper */;

const modelPath = `${FileSystem.documentDirectory}models/nafnet.onnx`;
// Download the ONNX model once on first launch; cache locally.

const clean = await denoiseRaw(raw, backend, { model: { path: modelPath } });

Consumers typically use expo-image-manipulator + a small RGBA decoder for input, and expo-file-system to manage model downloads.

Architecture

your app ── imports "paperkit" ──► paperkit entry file (web / native / node)
                                        │
                                        │  wires core + runtime adapter
                                        ▼
                                    paperkit core
                                        │
         ┌───────────┬────────┬─────────┴─────────┬────────┬──────────┐
         ▼           ▼        ▼                   ▼        ▼          ▼
     geometry    appearance  ocr               layout    input     export
   (classical) (ML + classical) (ML)         (ML dispatcher) (peer)  (pure)
         │           │        │                   │        │          │
     quality    classify    text              batch     kie      workers
     (classical) (both)   (classical)       (pure code) (pure+VLM) (patterns)

Extending paperkit

Add a new feature module:

Create src/modules/<area>/<feature>.ts. Export functions that take a Backend and any options.
If the feature needs ML, call backend.loadModel(...) and session.run(...). Core tensor helpers (imageToTensor, tensorToImage, tiling, homography) live in src/core/.
Add an index file for the module and re-export from src/entries/shared.ts.
Done — your feature works on every runtime automatically.

Add a new runtime (Deno, Bun, Electron renderer, …):

Implement the Backend interface in src/adapters/<runtime>.ts.
Create src/entries/<runtime>.ts (follow the pattern of existing entries).
Add the entry to tsup.config.ts and the exports map in package.json.

Add an alternate recognizer:

Implement the Recognizer interface from src/modules/ocr/types.ts — recognize(image) → { text, confidence } + dispose(). Pass your Recognizer instance directly as options.recognition (for createOcrPipeline) or as options.text / options.formula / options.table (for parseDocument). No pipeline changes needed.

Validation

Every ML feature has been smoke-tested end-to-end against the recommended weights:

| Script | Covers | |---|---| | scripts/smoke-ocr.ts | PP-OCRv4 detection + recognition on a mixed-CH/EN book page | | scripts/smoke-denoise.ts | OpenCV NAFNet deblurring on a blurred document | | scripts/smoke-handwriting.ts | TrOCR small (int8) on an IAM handwriting line | | scripts/smoke-formula.ts | TexTeller (int8) on a display equation | | scripts/smoke-table.ts | SLANet-plus + PP-OCRv4 cell text on a multi-row table with colspan="4" | | scripts/smoke-dewarp.ts | UVDoc grid-sample on a scanned book page | | scripts/smoke-layout.ts | DocLayout-YOLO on a two-column paper with display formulas | | scripts/smoke-classify.ts | DiT RVL-CDIP (int8) on scientific paper / form / book | | scripts/smoke-pdf-roundtrip.ts | rasterizePDF → OCR → toSearchablePDF multi-page roundtrip |

Run any smoke script with npx tsx scripts/<name>.ts. Each expects the weights to live under models/<feature>/ (see the model download section).

Unit tests: 257 passing, 100 % statement / line / function coverage on src/core/** and src/modules/**.

Documentation

Per-feature guides in docs/features/:

| Module | Purpose | Models | |---|---|---| | geometry.md | EXIF, page detection, perspective, deskew, dewarp | UVDoc (dewarp only) | | appearance.md | Denoise, binarize, shadow removal | NAFNet / Restormer / Swin2SR (denoise only) | | ocr.md | Text detection + recognition + handwriting + formula + table | PP-OCR / TrOCR / TexTeller / SLANet | | layout.md | Typed-region dispatcher + parseDocument | DocLayout-YOLO + any recognizer | | input.md | PDF rasterization | None (peer deps: pdfjs-dist + canvas) | | export.md | Searchable PDF + Markdown + JSON | None | | quality.md | Blur detection | None | | classify.md | Keyword + image classification | DiT (image path only) | | text.md | Script detection | None | | kie.md | Rule-based + VLM integration pattern | None (rule path) | | batch.md | onProgress + batchMap | None | | workers.md | Browser + Node worker patterns | None |

Runnable examples live under examples/:

examples/node/scan.ts — phone photo → clean image (no ML)
examples/node/denoise.ts — denoise a single image with any ONNX model
examples/node/ocr.ts — OCR a single image with PP-OCR
examples/node-worker/ — full OCR pipeline running in worker_threads

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

paperkit

Why paperkit

Install

Expo specifics

What's in the box

Geometry — camera-photo clean-up

Appearance — pixel clean-up

OCR — printed text, handwriting, formulas, tables

Layout — typed-region dispatcher

Input — PDF rasterization

Export — turn results into files

Quality — pre-flight checks

Classification — what kind of document is this?

Text analysis

KIE — key-information extraction

Batch / progress / workers

Recommended models

PP-OCRv4 — printed-text OCR

DocLayout-YOLO — layout (75 MB)

TrOCR — handwriting (~64 MB, int8)

TexTeller — formula recognition (~303 MB int8)

SLANet-plus — table recognition (~7.4 MB)

UVDoc — dewarping (~30 MB)

NAFNet — denoise / deblur (~91 MB)

DiT — document classification (~83 MB int8)

Usage patterns

One-shot helpers

Reusable pipelines (keep models loaded)

Full phone-photo pipeline — no ML required for the classical part

Full document parse (layout + per-region recognition)

Batch over many images

Runtime-specific notes

Architecture

Extending paperkit

Validation

Documentation

License