@cyberlangke/tokkit-h2oai
v1.11.0
Published
H2O.ai tokenizer families for tokkit.
Readme
@cyberlangke/tokkit-h2oai
H2O.ai 官方 Danube tokenizer 的 tokkit 子包。
当前内置 family:
danube- 覆盖
h2oai/h2o-danube-1.8b-base - 也聚合当前确认完全复用同一 tokenizer 的
h2oai/h2o-danube-1.8b-chat
- 覆盖
danube2- 覆盖
h2oai/h2o-danube2-1.8b-base - 也聚合当前确认完全复用同一 tokenizer 的:
h2oai/h2o-danube2-1.8b-chath2oai/h2o-danube3-500m-baseh2oai/h2o-danube3-4b-base
- 覆盖
danube3-500m-chat- 覆盖
h2oai/h2o-danube3-500m-chat
- 覆盖
danube3-4b-chat- 覆盖
h2oai/h2o-danube3-4b-chat
- 覆盖
danube3.1-4b-chat- 覆盖
h2oai/h2o-danube3.1-4b-chat
- 覆盖
当前不纳入:
h2oai/h2o-danube-1.8b-sfth2oai/h2o-danube2-1.8b-sftGGUF/AWQ/ONNX等导出对象danube2-singlish-finetunedh2ogpt-*、h2ovl-*与其他旧微调 / 多模态路线
使用方法
npm install @cyberlangke/tokkit-h2oaiimport { getTokenizer } from "@cyberlangke/tokkit-h2oai"
const base = await getTokenizer("danube2")
const chat = await getTokenizer("h2oai/h2o-danube3-4b-chat")
console.log(base.encode("Hello, Danube!"))
console.log(chat.encode("Hello, Danube!"))