@cyberlangke/tokkit-cyberagent
v1.11.0
Published
cyberagent tokenizer families for tokkit.
Readme
@cyberlangke/tokkit-cyberagent
cyberagent 官方文本 tokenizer 的 tokkit 子包。
这是一个独立特殊协议子包,不包含在 @cyberlangke/tokkit 总包里。
当前内置 family:
open-calm- 覆盖
cyberagent/open-calm-1b - 覆盖
cyberagent/open-calm-small - 覆盖
cyberagent/open-calm-medium - 覆盖
cyberagent/open-calm-3b - 覆盖
cyberagent/open-calm-large - 覆盖
cyberagent/open-calm-7b
- 覆盖
calm2- 覆盖
cyberagent/calm2-7b - 覆盖
cyberagent/calm2-7b-chat
- 覆盖
calm3- 覆盖
cyberagent/calm3-22b-chat
- 覆盖
当前不纳入:
cyberagent/Mistral-Nemo-Japanese-Instruct-2408cyberagent/DeepSeek-R1-Distill-Qwen-*cyberagent/calm2-7b-chat-dpo-experimentalcyberagent/markupdm
说明:
open-calm-*当前共享同一套 tokenizer。calm2-7b与calm2-7b-chat当前共享同一套 tokenizer。calm3-22b-chat当前使用独立 tokenizer。- 由于该子包同时分发
Apache-2.0与CC-BY-SA-4.0的 tokenizer 资产,因此不进入@cyberlangke/tokkit总包。
使用方法
npm install @cyberlangke/tokkit-cyberagentimport { getTokenizer } from "@cyberlangke/tokkit-cyberagent"
const openCalm = await getTokenizer("open-calm")
const calm2 = await getTokenizer("cyberagent/calm2-7b-chat")
console.log(openCalm === calm2)