web-mrz-reader
v1.0.1
Published
Browser-based MRZ reader supporting TD1, TD2, and TD3 formats. Client-side OCR with custom Tesseract model — no server, no CDN.
Downloads
269
Maintainers
Readme
MRZ Reader
A browser-based MRZ (Machine Readable Zone) reader that uses webcam capture and OCR to extract document data. Supports passports, ID cards, and travel documents. All processing happens client-side for privacy.
Doc - Story
https://eringen.com/blog/browser-based-passport-mrz-reader-with-tesseract-js
Try it
https://eringen.com/workbench/web-mrz-reader/
npm
https://www.npmjs.com/package/web-mrz-reader
NPM User Guide
1. Copy Static Assets
Copy the trained model and Tesseract runtime files into your project's public/static directory:
mkdir -p public/model public/tesseract
# MRZ-trained OCR model
cp node_modules/web-mrz-reader/public/model/mrz.traineddata.gz public/model/
# Tesseract.js worker and WASM cores
cp node_modules/tesseract.js/dist/worker.min.js public/tesseract/
cp node_modules/tesseract.js-core/tesseract-core-simd-lstm.wasm.js public/tesseract/
cp node_modules/tesseract.js-core/tesseract-core-simd.wasm.js public/tesseract/
cp node_modules/tesseract.js-core/tesseract-core-lstm.wasm.js public/tesseract/
cp node_modules/tesseract.js-core/tesseract-core.wasm.js public/tesseract/2. Add Required HTML Elements
The script expects these specific element IDs to be present in the DOM:
<div style="position: relative">
<video id="camera" autoplay width="888" height="500"></video>
<canvas id="canvas" width="888" height="500"
style="position: absolute; top: 0; left: 0"></canvas>
</div>
<button id="cbutton" onclick="captureAndPerformOCR()">
Capture & Read MRZ
</button>
<p id="mrzOutput" style="font-weight: bold; font-family: monospace;"></p>
<p id="output" style="font-weight: bold; font-family: monospace;"></p>3. Include the Script
Copy index.js from the package into your project and load it as an ES module:
cp node_modules/web-mrz-reader/index.js src/mrz-reader.js<script type="module" src="./mrz-reader.js"></script>4. Adjust Paths (if needed)
If your static assets are served from a different directory, update the Tesseract paths inside the copied JS file:
Tesseract.recognize(blob, 'mrz', {
workerPath: './tesseract/worker.min.js', // adjust these
corePath: './tesseract/', // to match your
langPath: './model/', // asset paths
})Features
- Real-time webcam capture
- Custom-trained Tesseract model optimized for MRZ recognition
- Supports TD1 (ID cards), TD2 (travel documents), and TD3 (passports)
- Check digit validation for all formats
- Structured data extraction (name, document number, dates, etc.)
- Visual bounding box feedback on recognized text
- Fully client-side processing (no data leaves the browser)
Tech Stack
- TypeScript - Strict mode, modular architecture
- Vite - Dev server and production bundler (handles TS natively)
- Tesseract.js v5 - JavaScript OCR engine with WebAssembly (installed via npm)
- Custom MRZ Model - Trained specifically for MRZ text recognition
- Web APIs - MediaDevices, Canvas, Blob
Project Structure
web-mrz-reader/
├── index.html # Main HTML page
├── src/
│ ├── main.ts # Entry point: camera, DOM, OCR orchestration
│ ├── types.ts # Interfaces: TD1/TD2/TD3 results, validation
│ ├── checkdigit.ts # Check digit calculation and validation
│ └── parsers.ts # MRZ parsing, extraction, format detection
├── tsconfig.json # TypeScript configuration (strict mode)
├── vite.config.ts # Vite configuration
├── package.json # Dependencies and scripts
├── public/
│ └── model/
│ └── mrz.traineddata.gz # Custom Tesseract model for MRZ
└── model_training.md # Guide for training an improved modelSupported Formats
| Format | Document Type | Structure | |--------|------------------|------------------------| | TD1 | ID cards | 3 lines x 30 chars | | TD2 | Travel documents | 2 lines x 36 chars | | TD3 | Passports | 2 lines x 44 chars |
Extracted Data Fields
All formats: Nationality, Surname, Given Names, Document Number, Issuing Country, Date of Birth, Gender, Expiration Date, Validation (check digits)
TD1 additionally: Document Type, Optional Data 1, Optional Data 2
TD2 additionally: Document Type, Optional Data
TD3 additionally: Passport Number, Personal Number
Usage
npm install
npm run dev- Open the local URL shown by Vite
- Allow camera access when prompted
- Position MRZ area within camera view
- Click "Capture Image and Extract Text"
- View extracted data in JSON format
Type Check
npm run typecheckProduction Build
npm run buildThe output in dist/ can be deployed to any static host.
Requirements
- Node.js (for build tooling)
- Modern browser with WebAssembly support
- Camera access permission
- HTTPS or localhost (required for camera API)
License
Tesseract.js is licensed under Apache-2.0.
