@sharkcreep/tokensmith
v1.0.2
Published
TokenSmith — a Claude Code plugin and CLI that saves token usage through persistent memory, reusable skills, smart context injection, session compression, and usage analytics.
Maintainers
Readme
TokenSmith
Save Claude token usage with persistent memory, reusable skills, smart context injection, automatic session compression, and rich token analytics.
TokenSmith is a Claude Code plugin and a standalone CLI. It remembers the important context across sessions so Claude never has to re-read your project to answer the same question twice, compresses long conversations into concise summaries, and reports exactly how many tokens it saved you.
$ tokensmith --help
Usage: tokensmith [options] [command]
TokenSmith — persistent memory, reusable skills, and smart context for Claude Code.
Options:
-v, --version Show version
--db <path> override database path
--namespace <name> override namespace
--no-color disable color output
--log-level <level> silent | error | warn | info | debug
-h, --help display help for command
Commands:
init Create a token-smith.config.json and .tokensmith DB
memory Persistent, namespaced memories saved to SQLite
skill Reusable prompt templates ('skills')
context <query...> Build a compressed, relevance-ranked context bundle
compress Compress long session/project history into summaries
tokens Token usage analytics and reports
session Inspect and record Claude Code session messages
plugin Claude Code plugin hooks (machine-invoked)
completion <shell> Print shell completion script (bash | zsh | fish)Features
| Capability | What it does | Why it saves tokens |
|---|---|---|
| Persistent memory | Tagged, prioritised key/value store backed by SQLite. | Long-term facts never need to be re-described each turn. |
| Skill library | Reusable, templated prompts (debug-laravel, build-rest-api, …). | One short invocation replaces a long, bespoke prompt. |
| Smart context engine | Ranks stored memories, skills, and summaries by keyword + recency + priority. | Only the relevant context is injected, inside a strict token budget. |
| Auto compression | Turns 15k-token sessions into ~800-token summaries. | Runs the conversation window back to a small, stable size. |
| Token analytics | Per-day, per-kind tracking of raw vs. effective tokens. | You can prove the savings. |
| Plugin + CLI | Ships as a Claude Code plugin and a tokensmith binary. | Works the same way whether you're inside or outside Claude Code. |
Installation
# As an npm dependency (library + CLI)
npm install -g @sharkcreep/tokensmith
# Verify
tokensmith --version
tokensmith init # creates token-smith.config.json + .tokensmith/tokensmith.dbThe npm package is scoped (
@sharkcreep/tokensmith), but the installed binary is still calledtokensmith(plus thets-smithalias). You never have to type the scope after install.
Installing as a Claude Code plugin
# From inside Claude Code
/plugin install tokensmithOr, if you're developing locally, register this repo as a marketplace:
/plugin marketplace add /path/to/tokensmith
/plugin install tokensmith@tokensmith-marketplaceOnce installed, slash commands become available inside Claude Code:
/memory save <key> <content>/memory get <key>/skill run <name> var=value/context <query>/compress session <sessionId>/tokens stats
and the hooks in hooks/hooks.json automatically inject
relevant context and auto-compress long sessions.
Usage
Memories
tokensmith memory save api-base "Use https://api.example.com/v2" -t api -t urls -p critical
tokensmith memory list
tokensmith memory get api-base
tokensmith memory delete api-base
tokensmith memory clean # removes archived memories
tokensmith memory clean --all # wipes the namespace$ tokensmith memory list
╭──────────────┬──────────┬────────┬───────────┬─────────────────────╮
│ key │ priority │ tokens │ tags │ updated │
├──────────────┼──────────┼────────┼───────────┼─────────────────────┤
│ deploy-rules │ critical │ 4 │ policy │ 2026-04-23 16:54:04 │
├──────────────┼──────────┼────────┼───────────┼─────────────────────┤
│ payments │ normal │ 8 │ payments │ 2026-04-23 16:54:04 │
├──────────────┼──────────┼────────┼───────────┼─────────────────────┤
│ api-base │ critical │ 10 │ api, urls │ 2026-04-23 16:54:04 │
╰──────────────┴──────────┴────────┴───────────┴─────────────────────╯Skills
tokensmith skill add debug-laravel --file ./skills/debug-laravel.md -d "Laravel debugging"
tokensmith skill run debug-laravel version=11 php=8.3 env=prod symptom="500 on /orders"
tokensmith skill list
tokensmith skill delete debug-laravelSmart context
tokensmith context "payment integration"
# ✔ Context bundle
# 🧠 Relevant context loaded (3 items, 820 tokens)
# ⚡ 12.4k tokens avoided (94%)Pipe into anything:
tokensmith context "payment" --json | jq .renderedTextCompression
tokensmith compress session s-2026-04-23-01
tokensmith compress projectBefore: 15,000 tokens · After: 800 tokens · ✅ summary stored in the DB and surfaced automatically by the context engine.
Token analytics
tokensmith tokens stats
tokensmith tokens report # last 14 days
tokensmith tokens recent --limit 50$ tokensmith tokens stats
Token stats — my-project
────────────────────────
Overall: raw 44 → effective 22 (50.0% reduction)
ℹ 4 events · 0 sessions · model claude-sonnet-4-6
╭───────────────┬─────┬───────────┬───────┬───────────┬────────╮
│ kind │ raw │ effective │ saved │ reduction │ events │
├───────────────┼─────┼───────────┼───────┼───────────┼────────┤
│ context_build │ 22 │ 22 │ 0 │ 0.0% │ 1 │
├───────────────┼─────┼───────────┼───────┼───────────┼────────┤
│ memory_save │ 22 │ 0 │ 22 │ 100.0% │ 3 │
╰───────────────┴─────┴───────────┴───────┴───────────┴────────╯Export / import
tokensmith memory export ./backup.json
tokensmith memory import ./backup.json --overwriteCommands
tokensmith init # scaffold config + DB
tokensmith memory save <key> [content] # save / update memory
tokensmith memory get <key>
tokensmith memory list [-q keyword]
tokensmith memory delete <key>
tokensmith memory clean [--all]
tokensmith memory export <path>
tokensmith memory import <path> [--overwrite]
tokensmith skill add <name> --file | --template | --stdin
tokensmith skill run <name> key=value ...
tokensmith skill list
tokensmith skill delete <name>
tokensmith context <query...> [--budget N] [--no-memories] [--no-skills] [--no-summaries] [--json]
tokensmith compress session <sessionId> [--ratio 0.2] [--keep 4]
tokensmith compress project [--ratio 0.2]
tokensmith tokens stats [--json]
tokensmith tokens report
tokensmith tokens recent [--limit N]
tokensmith session append <sessionId> [-r role] [-m text]
tokensmith session list
tokensmith plugin hook <event> # invoked by Claude Code hooks
tokensmith plugin status
tokensmith completion bash | zsh | fishHow token savings work
TokenSmith tracks two quantities per event:
- raw tokens — what you would have paid for if Claude had to process the full underlying information (e.g. the original conversation, a full memory set, or a whole skill template).
- effective tokens — what actually gets sent to the model (the ranked + trimmed context bundle, or the compressed summary).
saved = raw − effective, reductionPct = saved / raw.
Every memory save, memory load, skill run, context build, compress,
and injection event is recorded in the usage_events table so you can audit
exactly where the savings came from.
Token counting uses gpt-tokenizer (pure JS, no native deps). Counts are
approximate for Anthropic models (±5% for typical English/code) but stable
across runs, which is what matters for budgeting and trend analytics.
Configuration
Create a token-smith.config.json at the project root (or copy the one in
token-smith.config.example.json):
{
"dbPath": ".tokensmith/tokensmith.db",
"namespace": "auto",
"tokenModel": "o200k_base",
"preferredModel": "claude-sonnet-4-6",
"compression": { "threshold": 8000, "targetRatio": 0.2, "keepLastMessages": 4 },
"context": { "maxTokens": 4000, "keywordWeight": 0.6, "recencyWeight": 0.25, "priorityWeight": 0.15 },
"analytics": { "warningTokens": 80000 },
"autoSaveMemories": false,
"logLevel": "info"
}All keys can also be overridden with env vars — see .env.example.
Per-project namespaces
By default the namespace is auto-detected from the git repo name
(.git directory). Every memory, skill, summary and usage event is scoped to
that namespace, so multiple projects can share a single database safely.
Development
git clone https://github.com/sharkcreep87/tokensmith.git
cd tokensmith
npm install
npm run dev -- memory list # run the CLI through tsx
npm run build # emit dist/
npm test # run Vitest
npm run test:coverage # + v8 coverage reportReleasing
Releases are cut automatically by GitHub Actions on any v* tag:
npm version patch --message "Release v%s" # bumps + commits + tags
git push --follow-tags # pushes tag → workflow runs → npm publishThe release.yml workflow uses npm's
trusted-publishing (OIDC) so no NPM_TOKEN secret is required and every
tarball gets a signed provenance statement on npmjs.com.
Project layout
src/
cli/ # CLI entrypoint + TTY helpers
commands/ # commander subcommands (memory, skill, context, …)
services/ # domain logic (pure, injectable)
repositories/ # SQLite persistence (one per aggregate)
db/ # schema + connection bootstrap
plugin/ # Claude Code hook handlers
utils/ # logger, errors, tokens, text, git, …
types/ # shared domain types + Zod schemas
config/ # cosmiconfig + env var resolution
tests/ # vitest specs
.claude-plugin/ # plugin manifest + marketplace manifest
commands/ # /slash command docs for Claude Code
hooks/ # hook wiring for Claude Code
skills/ # ready-to-use skill templatesArchitecture highlights
- Repository + service + command — each layer only knows about the one below it, so swapping storage or rendering stays local.
- Dependency injection via a tiny composition root in
src/container.ts(no decorators, no reflect-metadata). - Zod at every boundary — CLI args, config files, and plugin payloads are all validated before they reach the service layer.
- SQL via prepared statements — prevents injection by construction.
- Deterministic extractive summariser by default; pluggable LLM-backed
summariser via
CompressionService's constructor.
Performance guarantees
TokenSmith runs inside the critical path of every Claude Code turn, so its hooks are engineered to be effectively invisible to the user:
| Hook | Internal budget | Outer Claude Code timeout | Fail-open? |
|---|---|---|---|
| SessionStart | 150 ms | 2000 ms | ✅ empty response |
| UserPromptSubmit | 250 ms | 3000 ms | ✅ empty response, no injection |
| PreToolUse | 100 ms | 1500 ms | ✅ empty response |
| PostToolUse | 100 ms | 1500 ms | ✅ empty response |
| SessionEnd | 1000 ms | 5000 ms | ✅ empty response |
What this means in practice:
- Hard deadlines. Every hook is wrapped in
withTimeout(). If the work isn't finished within the internal budget, TokenSmith returns a valid empty response withperf.timedOut: trueand Claude Code proceeds normally. No turn is ever blocked waiting for TokenSmith. - Never throws. Every hook path catches exceptions and returns a
structured
{ ok: false }JSON response. Claude Code never sees a non-zero exit code or stack trace from us. - Lazy container. The SQLite DB, tokenizer, and config are opened lazily when the first command that needs them runs; irrelevant hooks stay close to zero-cost.
- Tuned SQLite. WAL journalling,
NORMALsynchronous mode, a 20 MB page cache, a 64 MB mmap window andMEMORYtemp store keep every query under a few milliseconds for the sub-megabyte stores TokenSmith produces. - Global kill switch. Set
TOKENSMITH_DISABLED=1(env var) orperformance.disabled: true(config) and every hook returns in under 1 ms with zero side effects. Nothing else changes. - Telemetry on every response. Each hook attaches
perf: { elapsedMs, timedOut }so the operator can alert on regressions.
Measured on a 2026 laptop with ~20 memories in the store:
run=1 wall=211ms internal=7.3ms injections=1 contextTokens=417
run=2 wall=228ms internal=8.2ms injections=1 contextTokens=417
run=3 wall=198ms internal=8.8ms injections=1 contextTokens=417Most of the wall time is Node startup (shared with every other plugin hook); TokenSmith's own work is single-digit milliseconds per turn.
Anti-hallucination guarantees
Injecting retrieved context into a prompt is a two-sided trade: it gives the model authoritative-looking text, which means bad retrievals can cause hallucinations. TokenSmith mitigates that risk with several defence layers, all on by default:
- Confidence floor. Non-critical memories need at least one keyword
overlap with the user's query AND a blended score ≥
minInjectionScore(0.12 by default) to be eligible. Weak matches are dropped, not "mostly shown" — there is no partial credit. - Strict grounding mode. When the bundle is empty, the plugin injects nothing. Silence is always preferred to irrelevant context.
- Grounding preamble. Every non-empty bundle starts with an explicit
system message that tells Claude:
- to cite items by id rather than paraphrase them,
- to prefer live conversation over stored memory on conflicts,
- that lower-priority items may be stale,
- that summaries are lossy and must be verified before action.
- Provenance on every item. Each memory/skill/summary carries its id,
priority, score, and last-updated timestamp — so Claude can cite
(
memory:abc123) instead of inventing. - Verbatim critical memories. Items with
priority: criticalare rendered inside fenced code blocks. They are never truncated, reformatted, or summarised by TokenSmith, and the grounding header instructs Claude not to paraphrase them either. - Labelled summaries. Every compressed summary is prefixed with
[COMPRESSED SUMMARY — lossy extract of prior conversation. Verify before acting.]so the model treats it as auxiliary, not authoritative. - Fidelity guard. If compression would produce a "summary" larger than the original, we refuse to commit it — preventing paraphrased bloat from masquerading as source material.
- No LLM summariser by default. The default summariser is a
deterministic extractive ranker over the original sentences; it cannot
invent content, only drop it. An LLM-backed summariser can be swapped in
via
new CompressionService(repos, tokens, config, customFn)if you explicitly want that trade-off. - Per-namespace isolation. Memories are scoped to the current git repo by default, so a memory from one project can never leak into another.
You can tune or disable any of these in token-smith.config.json:
{
"context": { "minInjectionScore": 0.12 },
"grounding": {
"mode": "strict", // "strict" | "normal" | "off"
"verbatimCritical": true,
"includeHeader": true,
"citeSources": true
}
}Setting grounding.mode: "off" disables all context injection entirely —
useful when debugging or when you want Claude to rely solely on live
conversation.
Security
- All SQL is parameterised (
better-sqlite3prepared statements). - User-supplied identifiers are restricted to
[A-Za-z0-9._:-]by Zod. - File paths supplied at the CLI are resolved with
safeResolveWithinto block path traversal. - No
eval,Function, or other dynamic code execution is used anywhere in the package. - Skill templates are purely substitutional —
{{var}}placeholders only, no expressions.
License
MIT © TokenSmith Contributors
