ocr-server
v1.0.6
Published
Mobile web app that captures photos and extracts text using a local llama.cpp LLM server
Downloads
763
Maintainers
Readme
OCR Server
A mobile web app that captures photos and extracts text using a local llama.cpp LLM server. Features a queued processing system, real-time status updates, and an accordion-style UI for viewing results.
Features
- 📱 Mobile-optimized camera interface with live preview
- ⏳ Queued processing - capture multiple images, process one at a time
- 🔄 Real-time status updates via polling
- ✅ Visual status indicators: pending, processing, complete, error
- 📂 Expandable accordion UI for viewing OCR results
- 🔒 Self-signed HTTPS (required for mobile camera access)
- ⚡ CLI options for configuration
Requirements
Installation
Option 1: Install as global command
npm install -g ocr-server
# Then run from anywhere:
ocr-server --helpOption 2: Clone and run manually
# Clone the repository
git clone <repo-url>
cd ocr-server
# Install dependencies
npm installUsage
Setup llama.cpp server
Download and setup NuMarkdown model
Download NuMarkdown model files from Hugging Face:
wget https://huggingface.co/mradermacher/NuMarkdown-8B-Thinking-GGUF/resolve/main/NuMarkdown-8B-Thinking.f16.gguf
wget https://huggingface.co/mradermacher/NuMarkdown-8B-Thinking-GGUF/resolve/main/NuMarkdown-8B-Thinking.mmproj-f16.gguf- Start llama.cpp server with NuMarkdown:
llama-server -m NuMarkdown-8B-Thinking.f16.gguf --mmproj NuMarkdown-8B-Thinking.mmproj-f16.gguf --port 8080Note: You can use any compatible vision model with llama.cpp. Simply replace the model paths with your own.
Start the OCR app
# If installed globally, run directly:
ocr-server --help
# Default settings (port 5666, connects to llama.cpp on localhost:8080)
ocr-server
# Custom port
ocr-server --port 3000
# Connect to remote llama.cpp server
ocr-server --llama-host 192.168.1.100 --llama-port 8080
# Bind to specific host
ocr-server --host 127.0.0.1 --port 3000CLI Options
| Option | Short | Description | Default |
|--------|-------|-------------|---------|
| --host | -h | Host to bind this server | 0.0.0.0 |
| --port | -p | Port for this HTTPS server | 5666 |
| --llama-host | - | Host of llama.cpp server | localhost |
| --llama-port | - | Port of llama.cpp server | 8080 |
| --help | - | Show help message | - |
Browser Interface
- Open your mobile browser to:
https://<your-server-ip>:5666 - Accept the self-signed certificate warning (required for camera access)
- Grant camera permissions when prompted
- Point the camera at text and tap "Capture & Process"
- Watch the queue status update in real-time
- Tap completed jobs to expand and view the OCR results
Output
OCR results are saved to the current directory as:
ocr_YYYY-MM-DD_HH-mm-ss.mdHow It Works
- Frontend: Captures camera frames and sends to backend via POST
/api/ocr - Backend: Images are added to an in-memory queue, job ID returned immediately
- Worker: Background processor handles one image at a time via llama.cpp API
- Real-time updates: Frontend polls
GET /api/jobsevery 2 seconds - Save: Results written to timestamped
.mdfiles with<answer>tags stripped
Development
# Run type checking
tsc --noEmit server.ts
# Start with hot reload
npm run devPublishing to npm
# Build the package
npm run build
# Publish to npm
npm publishLicense
MIT
