garu-ko

v0.6.2

Published

2 days ago

Ultra-lightweight Korean morphological analyzer for the web (1.7MB model, WASM, F1 93.7%)

0High
0Medium
0Low

zerry_jin

korean morphological-analysis nlp wasm lightweight browser

garu-ko

Browser-native Korean morphological analyzer. No server required.

1.6MB model bundled in npm package (no CDN needed)
93KB WASM engine -- runs in any modern browser
F1 91.1% on human-verified gold testset (vs. Kiwi 89.7%)
< 1ms inference per sentence
Offline-ready -- works without network
Live Demo -- try it in your browser

Comparison

| | Kiwi | MeCab-ko | garu-ko | |---|---|---|---| | Model size | ~40MB | ~50MB | 1.6MB | | npm package | No | No | Yes | | F1 (gold testset) | 89.7% | — | 91.1% | | F1 (NIKL MP) | 87.9% | ~85% | 93.7% | | Browser support | Impractical | No | Yes |

Quick Start

npm install garu-ko

import { Garu } from 'garu-ko';

const garu = await Garu.load();

// Morphological analysis
const result = garu.analyze('배가 아파서 약을 먹었다');
console.log(result.tokens);
// [
//   { text: '배',   pos: 'NNG', start: 0, end: 2 },
//   { text: '가',   pos: 'JKS', start: 0, end: 2 },
//   { text: '아프', pos: 'VA',  start: 3, end: 6 },
//   { text: '어서', pos: 'EC',  start: 3, end: 6 },
//   { text: '약',   pos: 'NNG', start: 7, end: 9 },
//   { text: '을',   pos: 'JKO', start: 7, end: 9 },
//   { text: '먹',   pos: 'VV',  start: 10, end: 13 },
//   { text: '었',   pos: 'EP',  start: 10, end: 13 },
//   { text: '다',   pos: 'EF',  start: 10, end: 13 },
// ]

// Simple tokenization
const tokens = garu.tokenize('나는 학교에 간다');
// ['나', '는', '학교', '에', '간다']

garu.destroy(); // free WASM memory

Custom Model

// Load from custom URL
const garu = await Garu.load({ modelUrl: '/models/custom.gmdl' });

// Load from ArrayBuffer
const res = await fetch('/models/custom.gmdl');
const garu = await Garu.load({ modelData: await res.arrayBuffer() });

API

`Garu.load(options?): Promise<Garu>`

Initialize WASM and load model. Uses bundled model by default.

| Option | Type | Description | |---|---|---| | modelData | ArrayBuffer | Provide model bytes directly | | modelUrl | string | Fetch model from URL |

`garu.analyze(text, options?): AnalyzeResult`

Returns morphological tokens with POS tags (Sejong tagset).

interface Token {
  text: string;   // surface form
  pos: POS;       // POS tag
  start: number;  // eojeol start offset
  end: number;    // eojeol end offset
}

Set options.topN > 1 to get N-best results as an array. Note: topN > 1 is not yet fully supported and may return fewer results.

`garu.nouns(text, options?): string[]`

Extract nouns (NNG, NNP) from text. Set options.includeSL to also include foreign tokens (SL) like "AI", "BM25".

garu.nouns('인공지능 기술이 발전했다');
// ["인공", "지능", "기술", "발전"]

garu.nouns('AI 기술이 발전했다', { includeSL: true });
// ["AI", "기술", "발전"]

`garu.tokenize(text): string[]`

Returns surface-form strings only. Lightweight alternative to analyze().

`garu.destroy(): void`

Free WASM memory. Instance is unusable after this call.

Acknowledgments

The morphological analysis model is trained on the NIKL Morpheme-Tagged Corpus (v1.1) provided by the National Institute of Korean Language (국립국어원). The model contains only derived frequency statistics, not original text.

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

garu-ko

Comparison

Quick Start

Custom Model

API

Garu.load(options?): Promise<Garu>

garu.analyze(text, options?): AnalyzeResult

garu.nouns(text, options?): string[]

garu.tokenize(text): string[]

garu.destroy(): void

Acknowledgments

License

`Garu.load(options?): Promise<Garu>`

`garu.analyze(text, options?): AnalyzeResult`

`garu.nouns(text, options?): string[]`

`garu.tokenize(text): string[]`

`garu.destroy(): void`