ink-on
v0.1.0
Published
Handwritten math expression recognition running entirely in the browser. Framework-agnostic ONNX inference engine powered by CoMER (ECCV 2022).
Downloads
110
Maintainers
Readme
ink-on
Handwritten math expression recognition running entirely in the browser. Framework-agnostic ONNX inference engine + Vue 3 canvas component, powered by CoMER (ECCV 2022).
Features
- 100% client-side — Runs entirely in the browser via ONNX Runtime Web (WASM/WebGPU). No server, no API key.
- Tiny models — INT8 quantized encoder (3.4 MB) + decoder (4.0 MB), total 7.2 MB.
- Web Worker inference — ONNX inference runs off the main thread, keeping the UI responsive during recognition.
- LaTeX auto-repair — Automatic brace balancing and
\frac/\sqrtargument fixing with KaTeX runtime validation. - Recognition modes — Auto, Number (digits + basic operators), and Expression mode with vocabulary masking.
- Framework-agnostic core — Use
InferenceEnginestandalone from React, Svelte, vanilla JS, or any framework. - Vue 3 component — Drop-in
<MathCanvas>with mouse + touch support, smooth Bézier strokes, and responsive sizing. - Beam search decoding — Adaptive beam width based on device capability for quality/speed balance.
- IndexedDB caching — Models are cached locally after first download; instant reload on revisit.
- PWA ready — Installable as a standalone app with Web App Manifest.
- Tested — 51 unit tests across 4 test suites with Vitest. CI/CD via GitHub Actions.
Architecture
┌──────────────────────────────────────────────────────────────┐
│ User draws on <MathCanvas> (or your own canvas) │
│ Stroke[] = [{ points: [{x, y}, ...], lineWidth: 3 }, ...] │
└───────────────────┬──────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────┐
│ preprocessStrokes(strokes) [Main Thread]│
│ ┌────────────────────────────────────────────────────────┐ │
│ │ 1. Resample stroke points at uniform 3px intervals │ │
│ │ 2. Render with Bézier curves (white on black canvas) │ │
│ │ 3. Scale to height=256, dynamic width (64px-aligned) │ │
│ │ 4. Convert to grayscale Float32 tensor + padding mask │ │
│ └────────────────────────────────────────────────────────┘ │
│ → PreprocessResult { tensor, mask, height, width } │
└───────────────────┬──────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────┐
│ Web Worker (inference.worker.ts) │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ ENCODER (CoMER — DenseNet + Transformer) │ │
│ │ Input: pixel_values [1,1,H,W] + pixel_mask [1,H,W] │ │
│ │ Output: encoder_features + encoder_mask │ │
│ ├────────────────────────────────────────────────────────┤ │
│ │ DECODER (Autoregressive Transformer) │ │
│ │ Greedy (beam=1) or Beam search (beam=2-3) │ │
│ │ Mode-based vocabulary masking (auto/number/expression) │ │
│ │ Repeat detection + length normalization │ │
│ └────────────────────────────────────────────────────────┘ │
│ → candidates: [{ ids, logProb }, ...] │
└───────────────────┬──────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────┐
│ Post-processing + Validation [Main Thread]│
│ ┌────────────────────────────────────────────────────────┐ │
│ │ 1. decodeToTokenArray(ids, vocab) │ │
│ │ 2. repairLatex(tokens) — brace balancing, arg fixing │ │
│ │ 3. isCompleteExpression(tokens) — completeness check │ │
│ │ 4. KaTeX validation — katex.renderToString(throwOnError)│ │
│ │ 5. Select first valid candidate from beam results │ │
│ └────────────────────────────────────────────────────────┘ │
│ → RecognitionResult { latex, tokenIds, encoderMs, ... } │
└───────────────────┬──────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────┐
│ Render LaTeX with KaTeX, MathJax, or your choice │
└──────────────────────────────────────────────────────────────┘Key design insights
- Off-main-thread inference — ONNX encoder + decoder run in a Web Worker. The main thread stays responsive for drawing and UI updates during the 1-2 second inference.
- Beam search → validation pipeline — The Worker returns top-N beam candidates. The main thread runs each through
repairLatex()→isCompleteExpression()→ KaTeX validation, selecting the first valid result. - Dynamic input width — The encoder tensor width adapts to the drawn content (aligned to 64px multiples). A simple "2" creates a 256×128 tensor instead of 256×1024, reducing encoder computation by up to 80%.
- Stroke resampling + Bézier — Raw touch points are resampled at uniform 3px intervals, then rendered with quadratic Bézier curves for smooth, consistent strokes matching CROHME training data.
- CROHME convention — Preprocessing follows the CROHME handwriting dataset format (white strokes on black background, top-left alignment) that the CoMER model was trained on.
- LaTeX auto-repair —
repairLatex()fixes common decoder errors: unbalanced braces, extra\frac/\sqrtarguments. Deterministic, no model dependency. - Vocabulary masking — Number mode restricts decoder output to digits, basic operators, and structural tokens, eliminating impossible symbols from beam search.
- Repeat detection — Both greedy and beam decoders detect and stop on repeated tokens, preventing garbage output like "EEEEE..." from ambiguous input.
- IndexedDB model cache —
fetchWithCache()stores downloaded ONNX models in IndexedDB. Subsequent page loads skip the 7.2 MB download entirely. - Multi-threaded WASM — Uses
SharedArrayBufferwith COOP/COEP headers for parallel WASM execution across multiple CPU cores.
Quick Start
Installation
npm install ink-onYou also need the peer dependencies:
npm install vue onnxruntime-webDownload Models
The ONNX models are not included in the npm package. Download and place them in your app's public/ directory:
| File | Size | Description |
| ------------------- | ------ | ------------------------------------------------------ |
| encoder_int8.onnx | 3.4 MB | CoMER encoder (DenseNet + Transformer), INT8 quantized |
| decoder_int8.onnx | 4.0 MB | CoMER autoregressive decoder, INT8 quantized |
| vocab.json | 4 KB | Token vocabulary (245 symbols) |
Place them at public/models/comer/ or any path you choose. You can download them from the GitHub repository releases.
Vue 3 Usage
<script setup lang="ts">
import { ref, onMounted, onUnmounted } from "vue";
import {
MathCanvas,
InferenceEngine,
preprocessStrokes,
isStrokeMeaningful,
loadVocab,
} from "ink-on";
import type { Stroke, RecognitionResult, Vocab } from "ink-on";
const canvasRef = ref<InstanceType<typeof MathCanvas> | null>(null);
const result = ref<RecognitionResult | null>(null);
let engine: InferenceEngine;
let vocab: Vocab;
onMounted(async () => {
vocab = await loadVocab("/models/comer/vocab.json");
engine = new InferenceEngine({
encoderUrl: "/models/comer/encoder_int8.onnx",
decoderUrl: "/models/comer/decoder_int8.onnx",
beamWidth: 3,
});
await engine.init();
});
onUnmounted(() => {
engine?.dispose();
});
async function onStrokesChange(strokes: Stroke[]) {
if (!isStrokeMeaningful(strokes)) return;
const input = preprocessStrokes(strokes);
result.value = await engine.recognize(input, vocab);
}
</script>
<template>
<MathCanvas
ref="canvasRef"
:width="700"
:height="300"
:line-width="3"
@strokes-change="onStrokesChange"
/>
<pre v-if="result">{{ result.latex }}</pre>
</template>Framework-Agnostic Usage (React, Svelte, Vanilla JS)
Import from ink-on/core to avoid the Vue dependency:
import {
InferenceEngine,
preprocessStrokes,
isStrokeMeaningful,
loadVocab,
} from "ink-on/core";
import type { Stroke } from "ink-on/core";
// Initialize once
const vocab = await loadVocab("/models/comer/vocab.json");
const engine = new InferenceEngine({
encoderUrl: "/models/comer/encoder_int8.onnx",
decoderUrl: "/models/comer/decoder_int8.onnx",
beamWidth: 3,
executionProvider: "wasm",
});
await engine.init();
// Recognize from your own canvas/stroke capture
const strokes: Stroke[] = [
{
points: [
{ x: 10, y: 20 },
{ x: 30, y: 50 } /* ... */,
],
lineWidth: 3,
},
];
if (isStrokeMeaningful(strokes)) {
const input = preprocessStrokes(strokes);
const result = await engine.recognize(input, vocab);
console.log(result.latex); // "x ^ { 2 } + y ^ { 2 }"
console.log(result.totalMs); // 450
}
// Cleanup when done
engine.dispose();Model Hosting
Required Server Headers
ONNX Runtime Web uses multi-threaded WASM via SharedArrayBuffer, which requires these HTTP headers:
Cross-Origin-Opener-Policy: same-origin
Cross-Origin-Embedder-Policy: require-corpVite example (vite.config.ts):
export default defineConfig({
server: {
headers: {
"Cross-Origin-Opener-Policy": "same-origin",
"Cross-Origin-Embedder-Policy": "require-corp",
},
},
});Without these headers, inference will still work but falls back to single-threaded execution (slower).
Hosting Options
- Static files — Place models in your app's
public/models/comer/directory - CDN — Upload to CloudFlare R2, AWS S3, or any CDN with CORS support
- GitHub Releases — Attach as release assets
If models are served from a different origin, ensure the CDN sends appropriate Access-Control-Allow-Origin headers.
API Reference
<MathCanvas> Component
Vue 3 component for capturing handwritten strokes with mouse and touch support.
Props
| Prop | Type | Default | Description |
| ------------- | -------- | ----------- | --------------------------------- |
| width | number | 600 | Canvas resolution width (pixels) |
| height | number | 300 | Canvas resolution height (pixels) |
| lineWidth | number | 3 | Stroke line width |
| strokeColor | string | '#000000' | Stroke color |
Events
| Event | Payload | Description |
| ---------------- | ---------- | -------------------------------------------------------------- |
| strokes-change | Stroke[] | Emitted when a stroke ends (all strokes including the new one) |
Exposed Methods
| Method | Description |
| --------- | ------------------------------------ |
| clear() | Remove all strokes and clear canvas |
| undo() | Remove the last stroke |
| strokes | Reactive ref to current stroke array |
InferenceEngine
Main ONNX inference engine. Framework-agnostic.
Constructor
new InferenceEngine(options: InferenceEngineOptions)| Option | Type | Default | Description |
| ------------------- | -------------------- | ---------- | -------------------------------------- |
| encoderUrl | string | required | URL to the encoder ONNX model |
| decoderUrl | string | required | URL to the decoder ONNX model |
| maxDecodeSteps | number | 50 | Maximum autoregressive decode steps |
| beamWidth | number | 3 | Beam search width (1 = greedy, faster) |
| executionProvider | 'wasm' \| 'webgpu' | 'wasm' | ONNX execution provider |
Methods
| Method | Returns | Description |
| ------------------------------- | ---------------------------- | ------------------------------------------------ |
| init() | Promise<void> | Load and initialize ONNX sessions (lazy, cached) |
| recognize(input, vocab, mode) | Promise<RecognitionResult> | Run encoder + decoder inference with mode |
| dispose() | void | Release ONNX sessions and free memory |
mode is an optional RecognitionMode parameter: 'auto' (default), 'number', or 'expression'.
RecognitionResult
interface RecognitionResult {
latex: string; // Decoded LaTeX string (e.g., "x ^ { 2 } + 1")
tokenIds: number[]; // Raw token IDs from decoder
encoderMs: number; // Encoder inference time (ms)
decoderMs: number; // Decoder inference time (ms)
totalMs: number; // Total inference time (ms)
}Stroke / StrokePoint
interface StrokePoint {
x: number;
y: number;
}
interface Stroke {
points: StrokePoint[];
lineWidth: number;
}Preprocessing
// Convert strokes to a normalized tensor for the encoder
preprocessStrokes(strokes: Stroke[]): PreprocessResult
// Check if strokes have enough content for meaningful recognition
// Filters out dots, accidental taps, and trivial input
isStrokeMeaningful(strokes: Stroke[]): booleanTokenizer
// Load vocabulary from a JSON file (cached after first call)
loadVocab(url: string): Promise<Vocab>
// Convert token IDs to a LaTeX string
decodeTokenIds(ids: number[], vocab: Vocab): string
// Convert token IDs to an array of LaTeX tokens (filters special tokens)
decodeToTokenArray(ids: number[], vocab: Vocab): string[]LaTeX Repair
// Auto-repair common decoder errors: unbalanced braces, extra \frac/\sqrt args
repairLatex(tokens: string[]): string[]
// Check if a token array forms a complete math expression
// Returns false for lone \frac, \sqrt, \sum without required arguments
isCompleteExpression(tokens: string[]): booleanModel Cache (IndexedDB)
// Fetch a model with IndexedDB caching (recommended)
fetchWithCache(url: string): Promise<ArrayBuffer>
// Direct IndexedDB access
getCachedModel(url: string): Promise<ArrayBuffer | null>
cacheModel(url: string, data: ArrayBuffer): Promise<void>Configuration Guide
Beam Width
| Beam Width | Speed | Quality | Recommended For | | ----------- | -------- | ------- | ---------------------------------- | | 1 (greedy) | Fastest | Good | Low-end devices, real-time preview | | 2 | Fast | Better | Mid-range devices | | 3 (default) | Moderate | Best | Desktop browsers |
Execution Provider
| Provider | Support | Performance | Notes |
| -------- | ---------------------- | ----------- | --------------------------- |
| wasm | All modern browsers | Good | Default, universal fallback |
| webgpu | Chrome 113+, Edge 113+ | 2-5x faster | Auto-falls back to WASM |
Adaptive Configuration
const cores = navigator.hardwareConcurrency || 2;
const mem = (navigator as any).deviceMemory || 4;
const engine = new InferenceEngine({
encoderUrl: "/models/comer/encoder_int8.onnx",
decoderUrl: "/models/comer/decoder_int8.onnx",
beamWidth: mem <= 2 ? 1 : cores <= 4 ? 2 : 3,
executionProvider: navigator.gpu ? "webgpu" : "wasm",
});Performance Tips
- Web Worker — The demo app runs inference in a Web Worker to avoid blocking the UI. Use
inference.worker.tsas a reference. - Debounce recognition — Don't call
recognize()on every stroke. Wait 1-2 seconds after the user stops drawing. - Use
isStrokeMeaningful()— Skip inference for dots and accidental taps. - Number mode — Use
'number'mode for digit-only input. Vocabulary masking significantly reduces decoder search space. - Preload models — Call
engine.init()early (on mount) to overlap loading with user interaction. - Model caching —
fetchWithCache()stores models in IndexedDB. First visit downloads 7.2 MB; revisits load instantly from cache. - Call
dispose()— Release ONNX sessions when unmounting to free WASM memory. - COOP/COEP headers — Without these, WASM runs single-threaded. Multi-threading can be 2-4x faster on multi-core CPUs.
Browser Requirements
| Feature | Minimum | Notes | | ----------------- | ------------------------------------- | ------------------------- | | WebAssembly | All modern browsers | Core requirement | | WASM SIMD | Chrome 91+, Firefox 89+, Safari 16.4+ | Enabled by default | | SharedArrayBuffer | Requires COOP/COEP headers | For multi-threaded WASM | | IndexedDB | All modern browsers | For model caching | | Canvas 2D | All modern browsers | For stroke preprocessing | | WebGPU (optional) | Chrome 113+, Edge 113+ | Faster execution provider |
How It Works
This library runs CoMER (Coverage-guided Multi-scale Encoder-decoder Transformer, ECCV 2022) entirely in the browser using ONNX Runtime Web.
Model Pipeline
- Stroke capture — Canvas captures pointer/touch events as
Stroke[]with coordinates and line width. - Preprocessing — Strokes are rendered to an offscreen canvas following CROHME training conventions (white strokes on black background, top-left aligned). The image is scaled to 256px height with dynamic width aligned to 64px multiples, then converted to a grayscale float32 tensor with a binary padding mask.
- Encoder — DenseNet backbone extracts multi-scale visual features. A Transformer encoder produces contextual feature maps with position embeddings.
- Decoder — An autoregressive Transformer decoder with coverage attention generates token IDs one at a time. Beam search with length normalization selects the best hypothesis. Repeat detection halts degenerate sequences.
- Tokenizer — Token IDs are mapped to LaTeX symbols via
vocab.json(245 symbols including digits, operators, Greek letters, and structural tokens like fractions and superscripts).
Optimizations
- INT8 quantization — Models are quantized from FP32 to INT8, reducing size from ~100 MB to 7.2 MB with minimal accuracy loss.
- Web Worker inference — ONNX inference runs off the main thread via Web Worker, preventing UI blocking during 1-2s recognition.
- Dynamic input width — Tensor width adapts to content, avoiding wasted computation on padding. Simple expressions run up to 80% faster.
- Stroke resampling — Raw touch points are resampled at uniform 3px intervals and rendered with Bézier curves, matching CROHME training data quality.
- LaTeX auto-repair —
repairLatex()fixes unbalanced braces and excess\frac/\sqrtarguments. KaTeX runtime validation selects the best beam candidate. - Vocabulary masking — Number mode restricts decoder logits to digits + basic operators, eliminating impossible tokens from beam search.
- Multi-threaded WASM —
SharedArrayBufferenables parallel execution across CPU cores. - IndexedDB caching — Models are downloaded once and cached locally for instant reload.
Benchmark
Evaluated on the CROHME 2014 test set (986 handwritten math expressions), end-to-end from InkML strokes through ink-on's full pipeline.
| Model | ExpRate | ≤1 edit | ≤2 edits | Size | | ------------------------- | ---------- | ---------- | ---------- | ---------- | | CoMER paper (FP32) | 59.33% | — | — | ~100 MB | | ink-on (INT8, beam=3) | 36.41% | 53.25% | 65.82% | 7.2 MB |
The ExpRate gap reflects INT8 quantization (92.8% size reduction), preprocessing differences, and limited vocabulary (113 tokens). See benchmark/ for reproduction scripts.
License
Apache License 2.0 — Copyright 2025 kimseungdae
한국어
브라우저에서 완전히 실행되는 손글씨 수학 수식 인식 라이브러리. 프레임워크 독립 ONNX 추론 엔진 + Vue 3 캔버스 컴포넌트. CoMER (ECCV 2022) 기반.
기능
- 100% 클라이언트 사이드 — ONNX Runtime Web(WASM/WebGPU)으로 브라우저에서 완전 실행. 서버 없음, API 키 없음.
- 경량 모델 — INT8 양자화 인코더(3.4MB) + 디코더(4.0MB), 총 7.2MB.
- Web Worker 추론 — ONNX 추론이 별도 Worker 스레드에서 실행되어 인식 중 UI 블로킹 없음.
- LaTeX 자동 수정 — 괄호 균형 맞춤,
\frac/\sqrt인수 자동 수정 + KaTeX 런타임 검증. - 인식 모드 — Auto, Number(숫자+기본연산자), Expression 모드. 어휘 마스킹으로 검색 공간 축소.
- 프레임워크 독립 코어 — React, Svelte, 바닐라 JS 등 어떤 프레임워크에서든
InferenceEngine단독 사용 가능. - Vue 3 컴포넌트 — 마우스 + 터치 지원
<MathCanvas>드롭인 컴포넌트, 부드러운 Bézier 스트로크, 반응형 크기 조정. - 빔 서치 디코딩 — 디바이스 성능에 따른 적응형 빔 폭으로 품질/속도 균형 조절.
- IndexedDB 캐싱 — 첫 다운로드 후 모델을 로컬에 캐시, 재방문 시 즉시 로드.
- PWA 지원 — Web App Manifest로 독립 앱 설치 가능.
- 테스트 완비 — Vitest 기반 51개 단위 테스트, GitHub Actions CI/CD.
빠른 시작
설치
npm install ink-on피어 의존성도 필요합니다:
npm install vue onnxruntime-web모델 다운로드
ONNX 모델은 npm 패키지에 포함되지 않습니다. 다운로드하여 앱의 public/ 디렉토리에 배치하세요:
| 파일 | 크기 | 설명 |
| ------------------- | ------ | -------------------------------------------------- |
| encoder_int8.onnx | 3.4 MB | CoMER 인코더 (DenseNet + Transformer), INT8 양자화 |
| decoder_int8.onnx | 4.0 MB | CoMER 오토리그레시브 디코더, INT8 양자화 |
| vocab.json | 4 KB | 토큰 어휘 (245개 심볼) |
public/models/comer/에 배치하거나 원하는 경로에 놓으세요. GitHub 리포지토리 릴리즈에서 다운로드할 수 있습니다.
Vue 3 사용법
<script setup lang="ts">
import { ref, onMounted, onUnmounted } from "vue";
import {
MathCanvas,
InferenceEngine,
preprocessStrokes,
isStrokeMeaningful,
loadVocab,
} from "ink-on";
import type { Stroke, RecognitionResult, Vocab } from "ink-on";
const canvasRef = ref<InstanceType<typeof MathCanvas> | null>(null);
const result = ref<RecognitionResult | null>(null);
let engine: InferenceEngine;
let vocab: Vocab;
onMounted(async () => {
vocab = await loadVocab("/models/comer/vocab.json");
engine = new InferenceEngine({
encoderUrl: "/models/comer/encoder_int8.onnx",
decoderUrl: "/models/comer/decoder_int8.onnx",
beamWidth: 3,
});
await engine.init();
});
onUnmounted(() => {
engine?.dispose();
});
async function onStrokesChange(strokes: Stroke[]) {
if (!isStrokeMeaningful(strokes)) return;
const input = preprocessStrokes(strokes);
result.value = await engine.recognize(input, vocab);
}
</script>
<template>
<MathCanvas
ref="canvasRef"
:width="700"
:height="300"
:line-width="3"
@strokes-change="onStrokesChange"
/>
<pre v-if="result">{{ result.latex }}</pre>
</template>프레임워크 독립 사용법 (React, Svelte, Vanilla JS)
Vue 의존성 없이 ink-on/core에서 import:
import {
InferenceEngine,
preprocessStrokes,
isStrokeMeaningful,
loadVocab,
} from "ink-on/core";
import type { Stroke } from "ink-on/core";
const vocab = await loadVocab("/models/comer/vocab.json");
const engine = new InferenceEngine({
encoderUrl: "/models/comer/encoder_int8.onnx",
decoderUrl: "/models/comer/decoder_int8.onnx",
beamWidth: 3,
});
await engine.init();
// 커스텀 캔버스/스트로크 캡처에서 인식
const strokes: Stroke[] = [
{
points: [
{ x: 10, y: 20 },
{ x: 30, y: 50 },
],
lineWidth: 3,
},
];
if (isStrokeMeaningful(strokes)) {
const input = preprocessStrokes(strokes);
const result = await engine.recognize(input, vocab);
console.log(result.latex); // "x ^ { 2 } + y ^ { 2 }"
console.log(result.totalMs); // 450
}
engine.dispose();모델 호스팅
필수 서버 헤더
ONNX Runtime Web은 SharedArrayBuffer를 통한 멀티스레드 WASM을 사용하며, 다음 HTTP 헤더가 필요합니다:
Cross-Origin-Opener-Policy: same-origin
Cross-Origin-Embedder-Policy: require-corp이 헤더 없이도 동작하지만 싱글스레드로 폴백되어 느려집니다.
호스팅 옵션
- 정적 파일 — 앱의
public/models/comer/디렉토리에 배치 - CDN — CloudFlare R2, AWS S3 등 CORS 지원 CDN에 업로드
- GitHub Releases — 릴리즈 에셋으로 첨부
API 레퍼런스
영문 API Reference 섹션을 참조하세요. 모든 인터페이스와 타입은 TypeScript로 완전히 타입이 지정되어 있어 IDE 자동완성으로 확인할 수 있습니다.
주요 API 요약
| API | 설명 |
| ------------------------ | ---------------------------------------- |
| MathCanvas | Vue 3 캔버스 컴포넌트 (마우스/터치 입력) |
| InferenceEngine | ONNX 추론 엔진 (인코더+디코더) |
| preprocessStrokes() | 스트로크 → 정규화된 텐서 변환 |
| isStrokeMeaningful() | 의미 있는 입력인지 검증 (점/탭 필터링) |
| loadVocab() | 어휘 JSON 로드 (캐싱됨) |
| decodeTokenIds() | 토큰 ID → LaTeX 문자열 변환 |
| decodeToTokenArray() | 토큰 ID → LaTeX 토큰 배열 변환 |
| repairLatex() | LaTeX 자동 수정 (괄호 균형, 인수 수정) |
| isCompleteExpression() | 수식 완결성 검사 |
| fetchWithCache() | IndexedDB 캐싱 포함 모델 다운로드 |
작동 원리
이 라이브러리는 CoMER (Coverage-guided Multi-scale Encoder-decoder Transformer, ECCV 2022)를 ONNX Runtime Web으로 브라우저에서 완전히 실행합니다.
모델 파이프라인
- 스트로크 캡처 — 캔버스가 포인터/터치 이벤트를 좌표와 선 두께가 포함된
Stroke[]로 수집합니다. - 전처리 — CROHME 학습 데이터 규격에 따라 스트로크를 오프스크린 캔버스에 렌더링합니다 (검은 배경에 흰 스트로크, 좌상단 정렬). 이미지를 높이 256px로 스케일링하고, 너비는 64px 배수로 동적 조정한 후 그레이스케일 float32 텐서와 바이너리 패딩 마스크로 변환합니다.
- 인코더 — DenseNet 백본이 다중 스케일 시각 특성을 추출합니다. Transformer 인코더가 위치 임베딩과 함께 컨텍스트 특성 맵을 생성합니다.
- 디코더 — 커버리지 어텐션을 갖춘 오토리그레시브 Transformer 디코더가 토큰 ID를 하나씩 생성합니다. 길이 정규화가 적용된 빔 서치가 최적의 가설을 선택합니다. 반복 감지가 퇴화 시퀀스를 중단합니다.
- 토크나이저 — 토큰 ID를
vocab.json을 통해 LaTeX 심볼로 매핑합니다 (숫자, 연산자, 그리스 문자, 분수/위첨자 등 구조 토큰 포함 245개 심볼).
최적화
- INT8 양자화 — FP32에서 INT8로 양자화하여 모델 크기를 ~100MB에서 7.2MB로 축소, 정확도 손실 최소화.
- Web Worker 추론 — ONNX 추론이 별도 Worker 스레드에서 실행되어 1-2초 인식 중 UI 블로킹 방지.
- 동적 입력 너비 — 텐서 너비가 콘텐츠에 맞게 조정되어 패딩에 대한 불필요한 연산을 제거. 단순 수식은 최대 80% 더 빠르게 실행.
- 스트로크 리샘플링 — 원시 터치 포인트를 균일한 3px 간격으로 리샘플링하고 Bézier 곡선으로 렌더링하여 CROHME 학습 데이터 품질과 일치.
- LaTeX 자동 수정 —
repairLatex()가 불균형 괄호와 초과\frac/\sqrt인수를 수정. KaTeX 런타임 검증으로 최적 빔 후보 선택. - 어휘 마스킹 — Number 모드에서 디코더 로짓을 숫자+기본연산자로 제한하여 불가능한 토큰을 빔 서치에서 제거.
- 멀티스레드 WASM —
SharedArrayBuffer로 CPU 코어 간 병렬 실행 가능. - IndexedDB 캐싱 — 모델을 한 번 다운로드 후 로컬에 캐시하여 즉시 재로드.
벤치마크
CROHME 2014 테스트셋(986개 손글씨 수학 수식)에서 ink-on 전체 파이프라인을 end-to-end로 평가했습니다.
| 모델 | ExpRate | ≤1 edit | ≤2 edits | 크기 | | ------------------------- | ---------- | ---------- | ---------- | ---------- | | CoMER 논문 (FP32) | 59.33% | — | — | ~100 MB | | ink-on (INT8, beam=3) | 36.41% | 53.25% | 65.82% | 7.2 MB |
ExpRate 차이는 INT8 양자화(92.8% 크기 축소), 전처리 차이, 제한된 어휘(113개 토큰)의 복합 효과입니다. 재현 스크립트는 benchmark/를 참조하세요.
라이선스
Apache License 2.0 — Copyright 2025 kimseungdae
