bpe-openai-wasm
v0.1.0
Published
WebAssembly bindings for bpe-openai tokenizer
Readme
bpe-openai-wasm
A WebAssembly replacement for OpenAI's tiktoken that's around 4x faster and supports repeated characters based on Github's bpe implementation.
Use this as a replacement for tiktoken in your JS/TS projects.
Features
- Fast BPE tokenization using Rust and WebAssembly
- Multiple tokenizer models: cl100k_base, o200k_base, voyage3_base
- TypeScript support with full type definitions
- Async API with proper WASM initialization
- Comprehensive methods: encode, decode, and count tokens
Installation
npm install --save bpe-openai-wasmUsage
import { Tokenizer, TOKENIZER_MODELS } from './bpe-openai-wasm';
// Create a tokenizer instance
const tokenizer = new Tokenizer("o200k_base")
// Encode text to tokens
const tokens = tokenizer.encode("Hello, world!");
console.log(tokens); // Uint32Array
// Count tokens without encoding
const count = tokenizer.count("Hello, world!");
console.log(count); // number
// Decode tokens back to text
const text = tokenizer.decode(tokens);
console.log(text); // "Hello, world!"Usage in Next.js Projects
Cry
Available Models
cl100k_base- Used by GPT-4, GPT-3.5-turboo200k_base- Used by GPT-4o, oX modelsvoyage3_base- Used by Voyage AI models
Architecture
The project consists of three layers:
- Core Tokenizers (
rust-gems/crates/bpe-openai/): Fast BPE implementations with pre-built tokenizer data - WASM Bindings (
bpe-openai-wasm/src/lib.rs): Rust-to-WASM bridge using wasm-bindgen - TypeScript API (
bpe-openai-wasm/src/index.ts): High-level async API for JavaScript/TypeScript
License
MIT
