ocr-server

v1.0.6

Published

14 days ago

Mobile web app that captures photos and extracts text using a local llama.cpp LLM server

Downloads

763

0High
0Medium
0Low

richardanaya

ocr camera llama llama.cpp vision mobile webapp typescript

OCR Server

A mobile web app that captures photos and extracts text using a local llama.cpp LLM server. Features a queued processing system, real-time status updates, and an accordion-style UI for viewing results.

Features

📱 Mobile-optimized camera interface with live preview
⏳ Queued processing - capture multiple images, process one at a time
🔄 Real-time status updates via polling
✅ Visual status indicators: pending, processing, complete, error
📂 Expandable accordion UI for viewing OCR results
🔒 Self-signed HTTPS (required for mobile camera access)
⚡ CLI options for configuration

Requirements

Node.js (>=18.0.0)
llama.cpp server with vision model

Installation

Option 1: Install as global command

npm install -g ocr-server

# Then run from anywhere:
ocr-server --help

Option 2: Clone and run manually

# Clone the repository
git clone <repo-url>
cd ocr-server

# Install dependencies
npm install

Usage

Setup llama.cpp server

Download and setup NuMarkdown model

Download NuMarkdown model files from Hugging Face:

wget https://huggingface.co/mradermacher/NuMarkdown-8B-Thinking-GGUF/resolve/main/NuMarkdown-8B-Thinking.f16.gguf
wget https://huggingface.co/mradermacher/NuMarkdown-8B-Thinking-GGUF/resolve/main/NuMarkdown-8B-Thinking.mmproj-f16.gguf

Start llama.cpp server with NuMarkdown:

llama-server -m NuMarkdown-8B-Thinking.f16.gguf --mmproj NuMarkdown-8B-Thinking.mmproj-f16.gguf --port 8080

Note: You can use any compatible vision model with llama.cpp. Simply replace the model paths with your own.

Start the OCR app

# If installed globally, run directly:
ocr-server --help

# Default settings (port 5666, connects to llama.cpp on localhost:8080)
ocr-server

# Custom port
ocr-server --port 3000

# Connect to remote llama.cpp server
ocr-server --llama-host 192.168.1.100 --llama-port 8080

# Bind to specific host
ocr-server --host 127.0.0.1 --port 3000

CLI Options

| Option | Short | Description | Default | |--------|-------|-------------|---------| | --host | -h | Host to bind this server | 0.0.0.0 | | --port | -p | Port for this HTTPS server | 5666 | | --llama-host | - | Host of llama.cpp server | localhost | | --llama-port | - | Port of llama.cpp server | 8080 | | --help | - | Show help message | - |

Browser Interface

Open your mobile browser to: https://<your-server-ip>:5666
Accept the self-signed certificate warning (required for camera access)
Grant camera permissions when prompted
Point the camera at text and tap "Capture & Process"
Watch the queue status update in real-time
Tap completed jobs to expand and view the OCR results

Output

OCR results are saved to the current directory as:

ocr_YYYY-MM-DD_HH-mm-ss.md

How It Works

Frontend: Captures camera frames and sends to backend via POST /api/ocr
Backend: Images are added to an in-memory queue, job ID returned immediately
Worker: Background processor handles one image at a time via llama.cpp API
Real-time updates: Frontend polls GET /api/jobs every 2 seconds
Save: Results written to timestamped .md files with <answer> tags stripped

Development

# Run type checking
tsc --noEmit server.ts

# Start with hot reload
npm run dev

Publishing to npm

# Build the package
npm run build

# Publish to npm
npm publish

License

MIT