mistral-tokenizer-ts
v2.2.1
Published
TS tokenizer for Mistral-based LLMs
Readme
🌬️ mistral-tokenizer-ts 🌬️
Typescript tokenizer for Mistral models.
Supported models
Generalist models
mistral-large-latest(points tomistral-large-2407)mistral-large-2402mistral-large-2407mistral-small-latest(points tomistral-small-2402)mistral-small-2402open-mistral-nemo(points toopen-mistral-nemo-2407)open-mistral-nemo-2407
Specialized models
codestral-latest(points tocodestral-2405)codestral-2405mistral-embed(points tomistral-embed-2312)mistral-embed-2312
Research models
open-mistral-7b(points toopen-mistral-7b-v0.3)open-mistral-7b-v0.1open-mistral-7b-v0.2open-mistral-7b-v0.3open-mixtral-8x7b(points toopen-mixtral-8x7b-v0.1)open-mixtral-8x7b-v0.1open-mixtral-8x22b(points toopen-mixtral-8x22b-v0.1)open-mixtral-8x22b-v0.1open-codestral-mamba(points toopen-codestral-mamba-v0.1)open-codestral-mamba-v0.1
Install
npm install mistral-tokenizer-tsUsage
import { getTokenizerForModel } from 'mistral-tokenizer-ts'
const tokenizer = getTokenizerForModel('open-mistral-7b')
// Encode.
const encoded = tokenizer.encode('Hello world!')
// Decode.
const decoded = tokenizer.decode([1, 22557, 1526])Tests
npm run testCredit
@imoneoifor the initial implementation@dqbdfor the tiktoken JS port@mistralaifor the Python tokenizers
