tokstream
v0.1.2
Published
Tokenizer-based token stream simulator with CLI and WASM
Readme
tokstream
中文 | English
A token streaming simulator powered by Hugging Face tokenizers. It downloads a tokenizer from HF Hub and generates tokens at a target rate, with live stats for target vs actual throughput.
Highlights
- Rust CLI with high‑precision pacing (sleep + spin)
- Web demo (WASM) and npx executable
- Random English / Chinese generation and text replay
- Configurable filtering strategy
- Target vs actual tokens/sec stats
- Workspace layout with reusable core
Project Layout
.
├── crates
│ ├── tokstream-core # tokenizer engine
│ ├── tokstream-cli # Rust CLI
│ └── tokstream-wasm # wasm-bindgen bindings
├── npm # npx CLI + web demo
├── bin # npm bin entry
├── Cargo.toml # workspace
├── justfile
├── package.json
├── README.md
└── README_ZH.mdRust CLI
Quick Start
cargo run -p tokstream-cli -- --model gpt2 --mode english --rate 8
cargo run -p tokstream-cli -- --model gpt2 --mode chinese --rate 8
cargo run -p tokstream-cli -- --model gpt2 --mode text --text "Hello" --repeat 3Install from crates.io
cargo install tokstream-cli
# or
cargo binstall tokstream-cliNotes:
- The binary name is
tokstreamafter installation. cargo binstallwill compile from source unless you provide prebuilt release assets and setrepositoryin the crate metadata.
Model & Auth
--model <id>HF Hub model id (default:gpt2)--revision <rev>HF revision (default:main)--hf-token <token>access token for private models
Modes
--mode <english|chinese|text>--text <text>text mode input--text-file <path>text mode input from file--loop-textloop text forever--repeat <n>repeat text n times
Rate Control
--rate <n>target tokens/sec--rate-min <n>min rate for random range--rate-max <n>max rate for random range--rate-sample-interval <n>sampling interval for rate range (seconds, default: 1)--batch <n>tokens emitted per batch--max-tokens <n>stop after n tokens
Pacing & Throughput
--pace <strict|sleep>pacing mode (default:strict)--spin-threshold-us <n>busy‑spin threshold forstrictmode--no-throttledisable pacing (measure max throughput)--no-outputdisable stdout output (closer to tokenizer upper bound)
Stats
--no-statsdisable stats output (stderr)--stats-interval <n>stats interval seconds (default: 1)
Random Output Filters
--no-skip-specialdo not skip special tokens--allow-digits--allow-punct--allow-space--allow-non-ascii--no-require-letter--no-require-cjk
Seed
--seed <n>random seed
Examples
# Random rate range sampled every 2 seconds
cargo run -p tokstream-cli -- --model gpt2 --mode english --rate-min 6 --rate-max 12 --rate-sample-interval 2
# Text mode from file, repeat 5 times
cargo run -p tokstream-cli -- --model gpt2 --mode text --text-file ./sample.txt --repeat 5
# Infinite loop text
cargo run -p tokstream-cli -- --model gpt2 --mode text --text "Hello" --loop-text
# Throughput upper bound (no throttle, no output)
cargo run -p tokstream-cli -- --model gpt2 --mode english --no-throttle --no-outputnpx CLI
Quick Start
npx tokstream@latest --model gpt2 --mode english --rate 8
npx tokstream@latest --web --port 8787For local development in this repo:
npx . --model gpt2 --mode english --rate 8Supported Flags (npx)
--model <id>--revision <rev>--hf-token <token>(or envHF_TOKEN/HUGGINGFACE_HUB_TOKEN)--mode <english|chinese|text>--text <text>--loop(loop text forever)--repeat <n>--rate <n>--rate-min <n>/--rate-max <n>--rate-sample-interval <n>--seed <n>--max-tokens <n>--no-skip-special--allow-digits/--allow-punct/--allow-space/--allow-non-ascii--no-require-letter/--no-require-cjk--no-stats/--stats-interval <n>--no-throttle/--no-output--web --port <n>
Notes:
--loop-text,--text-file,--batch,--pace, and--spin-threshold-usare Rust‑CLI only.
Web Demo
npx tokstream@latest --web --port 8787
# open http://localhost:8787While running, you can drag the rate slider or enable random rate range. The page shows target and actual throughput. The output pane is fixed‑height and scrolls independently.
Accuracy Notes
- Rust CLI
strictuses sleep + short spin for high precision. - Web / npx are best‑effort due to event loop and I/O limits.
- If actual throughput doesn’t change while raising target rates, you likely hit tokenizer limits.
- For maximum throughput testing, use the Rust CLI with
--no-output --no-throttle.
Build WASM (optional refresh)
npm run build:wasmWASM artifacts are committed and included in the npm package.
just Recipes
justTests
cargo clippy --workspace
cargo nextest run --workspaceLicense
MIT
