varak-chunker
v1.0.1
Published
Thai legal document processing — chunking, paragraph extraction, varak segmentation
Readme
varak-chunker
Thai legal document chunking & OCR pipeline for RAG systems.
Install
npm install github:YOUR_ORG/varak-chunker-distPeer dependency
npm install pdfjs-distUsage
import {
extractPdfText,
chunkBySections,
segmentVarakByRules,
classifyPdf,
} from 'varak-chunker';
const text = await extractPdfText(pdfBuffer);
const chunks = await chunkBySections(text);See index.d.ts for full API.
