@markitdownjs/image-ocr
v0.1.2
Published
Image OCR converter for MarkItDownJS
Downloads
441
Readme
@markitdownjs/image-ocr
Image OCR converter for MarkItDownJS. Extracts text from images using Tesseract.js.
Install
npm install @markitdownjs/image-ocr @markitdownjs/core
# Peer dependency (optional — required at runtime)
npm install tesseract.jsPeer dependency:
tesseract.js >= 5.0.0must be installed separately. The converter will throw at runtime if it is missing.
Usage
import { MarkItDown } from "@markitdownjs/core";
import { OcrConverter } from "@markitdownjs/image-ocr";
const parser = new MarkItDown();
parser.registerConverter(new OcrConverter({ lang: "eng" }));
const result = await parser.convert({ source: imageBuffer, mimeType: "image/png" });API
OcrConverter
Implements the IConverter interface from @markitdownjs/core.
Constructor options:
| Option | Type | Default | Description |
|--------|------|---------|-------------|
| lang | string | "eng" | Tesseract language code |
| workerPath | string | — | Custom path to Tesseract worker |
| Method | Description |
|--------|-------------|
| convert(input) | Runs OCR on the image buffer and returns extracted text as document nodes |
| canHandle(mimeType) | Returns true for image/png, image/jpeg, image/webp, image/tiff |
