@cyberlangke/tokkit-pleias
v1.11.0
Published
PleIAs tokenizer families for tokkit.
Readme
@cyberlangke/tokkit-pleias
PleIAs 官方文本模型的 tokkit 子包。
当前纳入的官方主线模型:
PleIAs/Pleias-350m-PreviewPleIAs/Pleias-1.2b-PreviewPleIAs/Pleias-3b-PreviewPleIAs/Pleias-PicoPleIAs/BaguettotronPleIAs/Monad
当前不纳入:
PleIAs/Pleias-RAG-350MPleIAs/Pleias-RAG-1BPleIAs/OCRonos
说明:
- 当前纳入的
6个官方主线模型都公开tokenizer.json,且都属于标准BPE + Split + ByteLevel路线。 Pleias-RAG-*当前模型页明确带有base_model:finetune:*标记,不纳入。OCRonos是 OCR / 文档专项线,不纳入当前主线。
使用方法
npm install @cyberlangke/tokkit-pleiasimport { getEncoding } from "@cyberlangke/tokkit-pleias"
const pleias = await getEncoding("PleIAs/Pleias-350m-Preview")
const monad = await getEncoding("PleIAs/Monad")
console.log(pleias.encode("Hello, PleIAs"))
console.log(monad.encode("Hello, Monad"))