@halo-sdk/tokenizer
v2.0.0
Published
Pluggable tokenizers for Halo AI SDK — accurate token counts for truncation budgets and cache/cost accounting
Downloads
212
Maintainers
Readme
@halo-sdk/tokenizer
Pluggable tokenizers for Halo AI SDK. Replaces the built-in chars / 4 heuristic behind estimateTokens with accurate counts — needed for honest truncation budgets and cache/cost accounting.
Installation
npm install @halo-sdk/tokenizerRequires @halo-sdk/core. gpt-tokenizer is an optional peer dependency, only needed for createGptTokenizer().
Usage
import { ApproxTokenizer, installTokenizer } from "@halo-sdk/tokenizer";
// Install once at startup; `estimateTokens` everywhere now uses it.
installTokenizer(new ApproxTokenizer());ApproxTokenizer is dependency-free and deterministic — a BPE approximation that costs punctuation, digits, and CJK far more realistically than chars / 4.
For exact GPT counts, install gpt-tokenizer and use the lazy factory:
import { createGptTokenizer, installTokenizer } from "@halo-sdk/tokenizer";
installTokenizer(await createGptTokenizer());Custom tokenizers
Implement Tokenizer (count(text): number) and pass it to installTokenizer, or wire a counter directly via setTokenCounter from @halo-sdk/core.
Call resetTokenizer() to revert to the heuristic.
