myshell-tools
v3.103.0
Published
**Hierarchical, multi-provider AI orchestration for your shell — over the CLIs you already use.**
Maintainers
Readme
myshell-tools
Hierarchical, multi-provider AI orchestration for your shell — over the CLIs you already use.
myshell-tools routes each task to the cheapest model likely to succeed, runs it on your real codebase, optionally has a different vendor review the result, and shows you exactly what it did and how many real tokens it used — with no fabricated data, ever.
Status:
3.2.0— honest, tested, and real. Claude, Codex, and opencode (experimental) all work, provider auth is detected for real, and the header always shows whether you're on the latest version ((latest)or→ x.y.z available).
Quickstart — zero install
npx myshell-toolsThat's it. npx fetches the package, runs the interactive setup, and:
- Detects which provider CLIs (Claude Code / Codex) are already installed.
- Offers to install any missing ones — one keypress (
Enterto install,nto skip). It never installs anything silently; consent is always required. - Offers to sign you in to providers that are installed but not yet authenticated.
- Drops you into the control-panel menu, ready to use.
On subsequent runs npx myshell-tools goes straight to the menu (setup only runs once).
Install once (optional)
If you prefer a permanent global install:
npm install -g myshell-tools
myshell-toolsWhy it exists
Using one frontier model for everything is wasteful (renaming a variable doesn't need Opus) and single‑model output has blind spots. myshell-tools addresses both, honestly:
- Cost‑aware routing — trivial work goes to the cheap tier (Haiku / GPT‑5 mini), real implementation to the mid tier, hard calls to the flagship. The efficiency shows up as a billing‑agnostic ratio (how many flagship tokens you avoided), not a dollar figure — because you're on a subscription, not metered API billing.
- One quality knob —
Efficient · Balanced · Max— sets how readily routing reaches the strongest (Opus / GPT‑5.5‑class) model. Because you're on a flat‑rate subscription, the scarce resource is quota/rate‑limit headroom, not dollars — so flagship access is an adaptive per‑turn decision, not a fixed price ceiling.Efficientnever auto‑opens the flagship;Balancedearns a single flagship pass on a turn that proves it needs one (high/critical risk, low confidence, or a reviewer escalation) — and holds back on an observedfreeplan to preserve quota;Maxopens it whenever a turn asks. The default auto-detects from your subscription (a Max plan auto-selectsMax; smaller plans default toBalanced) — no configuration, no interrogation. Change it on the home screen (m), in-chat (/mode), or in Settings; it's one global knob, changeable anywhere.Efficient— stays on the lighter models, escalating among them only when a turn needs it; reviews cross-vendor on critical-risk tasks only. Won't auto-open the flagship.Balanced— earns one pass at the strongest model on a turn that proves it needs it (high-risk / low-confidence / reviewer escalation); otherwise stays mid-tier. Vetoed on an observed free plan.Max— opens and reaches for the strongest model and reviews high/critical work; best answers, slower, never capped.
- Keeps going until it's done — ask for something big and the chat just does it; if a job is too large for a single turn, it offers to keep working autonomously, step by step, until it's done — you say "yes", no command to learn (or start it explicitly with
/goal <text>). Bounded by a turn ceiling and Esc, and it survives per-turn timeouts so a multi-file build actually completes across turns. - A real advisor, not an order-taker — for a decision (language, library, design) it forms an opinion and recommends a clear winner, surfaces a strong option you may not have considered, and asks only the one or two questions that actually change the answer.
- Cross‑vendor adversarial review — a different vendor checks the first model's output (Codex reviewing Claude, or vice‑versa). Different families, different blind spots. Review gating depends on mode; see the quality knob above.
- Multi-turn context continuity — follow-up messages carry real context. Prior conversation turns are compacted into a bounded history block (~6 k chars, most recent 12 turns) and replayed to the model, so it actually knows what was said earlier. Confidence envelopes are stripped before replay to save tokens.
- Native session continuity (experimental, opt-in, default off) — when enabled, a conversation that stays on one provider reuses that provider's native session instead of replaying the history block, for better fidelity and less re-sent context. Claude uses a session id we choose (the conversation id); Codex has its generated thread id captured (from
thread.started), persisted on the turn, and resumed viacodex exec resume. Falls back to history replay automatically when a turn routes to a provider without an active session. Because the live behavior depends on your CLI, it ships off with a gated verification test covering both providers —MYSHELL_NATIVE_SESSION_E2E=1 npm run test:integration— and you enable it (Settings → Native sessions) only after it passes on your setup. - Parallel Subscription Panel (experimental, opt-in, default off) — on hard (high/critical-risk) turns, instead of running one model then maybe another, it runs a concurrent panel of your signed-in providers — each answers independently — then a cross-vendor synthesizer reconciles their answers into one (the synthesizer is admitted to the strongest model on those hard turns). This is uniquely a subscription-first move: on a flat-rate plan the extra concurrent runs cost $0 in dollars (the cost is quota + latency), so you buy genuinely independent cross-vendor judgment — which catches single-model confident-but-wrong answers — something an api-key / per-token tool would never afford. EXPERIMENTAL, default OFF: it needs ≥2 signed-in providers to do anything, spends 2–4 provider runs per hard turn (quota + wall-clock), and only triggers on turns the classifier rates high/critical risk. Enable via Settings →
[7] Panel (experimental). - Learned Routing (Local Outcome Learner) (experimental, opt-in, default off) — learns from your own ledger which provider tends to finish your work best per tier (ranked by observed success rate, tie-broken by latency) and prefers it. It's observed-only — it uses only recorded success + duration, never plan/quota/token inference. With no per-token dollar cost the honest routing question isn't "cheapest token" but "which of my subscriptions actually does my work best, fastest" — and only your own history answers that. EXPERIMENTAL, default OFF: it needs real history before it changes anything (≥3 runs per provider and ≥2 qualifying providers per tier, else it does nothing), and it only reorders among reachable signed-in providers — never stranding or inventing one. Enable via Settings →
[8] Learned routing (experimental). - Container / SSH sign-in —
myshell-tools loginauto-detects headless and cloud-IDE environments (Replit, Codespaces, Gitpod, SSH sessions) and switches to a no-localhost sign-in flow automatically. Force either flow with--codeor--browser. For Claude it runsclaude auth login(claude saves its own credential — nothing for us to store); if the browser shows a localhost / "can't be reached" error after you click Authorize, copy the full URL from the address bar (it contains the sign-in code) and paste it back when claude asks. For Codex, a device-code flow is used. - opencode provider (experimental) — auto-detected; works instantly with free hosted models (no keys). As a subscription/free provider it uses whatever model you've configured in opencode (a free opencode-zen model, or a premium one you've added — e.g. Kimi K2); myshell-tools doesn't pin a model for it, so "just use whatever opencode has" works out of the box.
[o]is always visible in the Auth section: it installsopencode-ai(with consent) if missing, then runsopencode auth login. Route to it explicitly or let the policy fall back to it automatically. - Routing prefers advertised models — detection passes each provider's actual model list to
route(), which picks the cheapest model the CLI actually has, not just the cheapest in the pricing table. Falls back gracefully if the advertised list doesn't match any pricing entry. - Subscription, not metering — it drives the Claude Code, Codex, and opencode CLIs you already use. No API keys, no per‑token bill for the free path.
- Honest by construction — every number on screen traces to a real measurement. A suite of architecture tests makes fabricated/mock output literally unmergeable.
Requirements
- Node.js ≥ 20 for the compiled CLI (
dist/). Node ≥ 22 is required to run the test suite (see Development below). - At least one provider CLI.
npx myshell-toolswill offer to install them for you on first run — or you can install manually:- Claude Code —
npm install -g @anthropic-ai/claude-code, then sign in when prompted. In containers or over SSH, runmyshell-tools login claude --code: it runsclaude auth loginand shows a sign-in link; if the page errors on a localhost redirect after you authorize, copy the full address-bar URL (it carries the code) and paste it back when claude asks. Claude persists its own credential — myshell-tools stores nothing. - Codex —
npm install -g @openai/codex. In containers or over SSH, runmyshell-tools login codex --codefor a device-code flow (no localhost callback needed). - opencode (experimental, optional) — auto-detected when the
opencodeCLI is installed. Works immediately with free hosted models (no keys).[o]is always available in the control-panel Auth section: if opencode is not yet installed, selecting it asks for consent and runsnpm install -g opencode-ai, then runsopencode auth loginto add a premium provider or subscription — myshell-tools never handles the credentials. Appears in the control-panel header and raw-session picker only when installed.
- Claude Code —
You need one to start; install both claude and codex to unlock cross‑vendor review.
Install options
| Method | Command | Notes |
|--------|---------|-------|
| Zero-install (one-time) | npx myshell-tools | Fetches and runs; first-run setup included |
| Global install (recommended for regular use) | npm install -g myshell-tools then myshell-tools | Fastest; gets the update notifier |
| From source | See below | For development |
npx vs. npm install -g — which to choose?
npx myshell-tools is convenient for a one-off run but caches the downloaded version — subsequent invocations reuse the cache and will not pick up new releases automatically. The tool detects when it's running under npx and, if a newer version exists, tells you exactly that instead of pretending to self-update (a global install run from an npx process is ignored by the next npx invocation). If you're stuck on an old version via npx, clear the cache (rm -rf ~/.npm/_npx) or — better — install globally:
For day-to-day use, a global install is recommended:
npm install -g myshell-tools
myshell-toolsThe globally-installed CLI includes the update notifier: it checks the npm registry once per 24 hours (cached, non-blocking) and shows a banner in the control panel when a newer version is available:
▲ Update available: 2.9.0 → 3.0.0 (press u)Press u to install the update in-place (npm install -g myshell-tools@latest). No relaunch is forced — restart the CLI when you're ready.
You can also enable auto-update so the CLI updates and relaunches itself silently at startup. Auto-update is on by default — the first-run prompt asks Keep myshell-tools up to date automatically? (Y/n) and defaults to yes.
To opt out:
- During first-run setup: answer
nto theKeep myshell-tools up to date automatically? (Y/n)prompt. - In the control panel:
[s] Settings → [3] Auto-update: on → off. - Or set
"autoUpdate": falsein~/.myshell-tools/config.json. - Or set
MYSHELL_NO_UPDATE=1in your environment to disable auto-update permanently without changing config.
To update manually at any time:
npm install -g myshell-tools@latestFrom source
git clone <this-repo>
cd myshell-tools
npm install
npm run build
node dist/cli.js --help
# optional: make `myshell-tools` available globally
npm linkUsage
myshell-tools [command] [options]
Commands:
(none) Open the interactive control panel (default)
run <task...> Run a one-shot task and exit
repl Plain line REPL (no menu)
login [prov] Sign in to a provider (claude or codex) via its own OAuth
cost Show real spend + the cost-routing counterfactual
Options:
-h, --help Show help
-v, --version Print versionA real run
These are actual, unedited outputs (your costs/timings will differ):
$ myshell-tools run "what is 2 plus 2"
▶ WORKER (claude/claude-haiku-4-5) attempt 1
2 plus 2 equals 4.
✓ tier done — confidence: 100%, 312 tokens, duration: 5648ms
Success — tier: worker, 312 tokens, attempts: 1, session: 0dbfe2e3-…The confidence (100%) is parsed from the model's own structured reply, not invented. The token count is the CLI's own reported usage — real and measured. Because myshell-tools drives your subscription CLIs (not metered API keys), the hot path shows tokens, not dollars; a per-task dollar figure wouldn't map to flat subscription billing. If you want a rough API-equivalent cost estimate, it lives in myshell-tools cost.
Health — automatic, no command needed
The control panel checks its own environment on every launch (Node version, whether the state directory is writable, pricing freshness) and shows a short, actionable warning only when something is actually wrong. When everything is fine, it stays quiet — you never run a "doctor" command.
If you want the full report explicitly (handy for support threads or CI), it's still there as a hidden, scriptable command — status, check, or doctor:
$ myshell-tools status
myshell-tools — environment health
Platform: linux
Node: v22.19.0
...
Providers
✓ claude — installed, version: 2.1.157 (Claude Code)
auth: signed in (pro)
✓ codex — installed, version: codex-cli 0.135.0
auth: signed in
Ready — at least one provider is available.Usage & efficiency (cost)
This is a subscription tool, so the everyday UI shows real, measured tokens — never per-task dollars, which wouldn't map to flat subscription billing. The on-demand cost view leads with tokens and a billing-agnostic routing-efficiency ratio, then offers a clearly-captioned API-equivalent dollar estimate for anyone who wants the magnitude:
$ myshell-tools cost
myshell-tools — usage & efficiency
Tasks run: 3
Tokens used: 12.4k (real, measured)
Per-model usage
claude-haiku-4-5: 2 tasks, 4.1k tokens
claude-sonnet-4-6: 1 task, 8.3k tokens
Routing efficiency
Routing picked cheaper-tier models where it could — ~6.3× less than sending every task to the flagship (claude-opus-4-7).
Estimated cost — API-equivalent (list price), not your subscription bill
Routed: ~$0.0020 · always-flagship: ~$0.0126The efficiency ratio is honest under a subscription (it compares flagship tokens you avoided, not dollars you were charged). The dollar figures are explicitly labeled an API-equivalent estimate — not your actual bill — and both use the same basis (list price × tokens), so "routed vs always-flagship" is apples-to-apples and internally consistent.
How it works
classify ─▶ route(cheapest tier) ─▶ run ─▶ assess
│
high-risk IC work ────────────┘──▶ cross-vendor review (other vendor)
approve → accept
revise → retry with feedback
escalate→ manager tier
low confidence / failure ─────────▶ escalate to a higher tier- Tiers map to stable model aliases (
haiku/sonnet/opus, or the Codex tiers), so when a vendor ships a newer model the alias resolves to it automatically — no myshell-tools update needed. - Cost prefers the provider CLI's own reported figure (Claude does this); otherwise it estimates from real token counts and a dated, staleness‑warned price seed.
- Every run is recorded to an append‑only session log and cost ledger under
.myshell-tools/.
The honesty contract
This is a ground‑up rebuild whose first principle is: the tool never shows fabricated, mocked, or randomized data as if it were real. It's enforced, not promised:
- Architecture guard tests fail the build if the UI/command layers contain hardcoded "AI responses", fake metrics, or a digit‑then‑
%literal; if the orchestration core touches the filesystem, clock, or RNG directly; or if any module other than the entry point can terminate the process. - An extensive unit + architecture-guard suite (plus contract tests with parsers pinned to recorded real transcripts), with
tsc --strict, ESLint, and a cleannpm packchecked in CI across Windows / macOS / Linux.
Architecture
Hexagonal / ports‑and‑adapters:
src/core/— pure orchestration (classify, route, assess, review, escalate). No I/O; everything injected. 100% testable with fakes.src/providers/— theProviderport + Claude/Codex adapters (viaexeca, prompt over stdin, cancelable, streaming).src/infra/— atomic session/ledger persistence, clock, pricing seed.src/interface/+src/ui/— REPL, one‑shot runner, streaming renderer, theme.src/commands/—doctor,cost.
One runtime dependency (execa, for correct cross‑platform process handling — including Windows process‑tree cancellation). Everything else is the Node standard library.
Development
Node ≥ 22 is required to run the test suite. npm test uses
node --experimental-strip-types (native TypeScript stripping, available from
Node 22+). The compiled runtime (dist/) supports Node ≥ 20, so
package.json engines is left at >=20.
npm run typecheck # tsc --strict, 0 errors
npm run lint # ESLint (typescript-eslint strict)
npm test # unit + architecture tests (requires Node ≥ 22)
npm run test:contract # parser contract tests vs recorded transcripts
npm run build # tsc → dist/License
MIT — see LICENSE.
