@cyberlangke/tokkit-snowflake
v1.11.0
Published
Snowflake tokenizer families for tokkit.
Downloads
69
Readme
@cyberlangke/tokkit-snowflake
Snowflake 官方 Arctic 文本 tokenizer 的 tokkit 子包。
当前内置 family:
snowflake-arctic-base- 覆盖
Snowflake/snowflake-arctic-base
- 覆盖
snowflake-arctic-instruct- 覆盖
Snowflake/snowflake-arctic-instruct
- 覆盖
当前不纳入:
Snowflake/snowflake-arctic-instruct-vllm
说明:
- 当前纳入范围只包含
Snowflake官方组织下公开可下载tokenizer.json的 Arctic 文本主线。 snowflake-arctic-base与snowflake-arctic-instruct的merges、normalizer、pre_tokenizer、decoder相同,但vocab与added_tokens不同,因此必须保留为两个独立 family。
使用方法
npm install @cyberlangke/tokkit-snowflakeimport { getTokenizer } from "@cyberlangke/tokkit-snowflake"
const base = await getTokenizer("snowflake-arctic-base")
const instruct = await getTokenizer("Snowflake/snowflake-arctic-instruct")
console.log(base === instruct)