@burtson-labs/bandit-stealth-cli
v1.7.132
Published
Bandit — a local-first AI coding agent for your terminal. Same runtime as the Bandit Stealth VS Code / Cursor extension.
Downloads
14,937
Readme
Bandit — Agent CLI
Local-first AI coding agent for your terminal.
Your code never leaves your machine. Works with any Ollama model.
Prefer an IDE? The sibling Bandit Stealth extension for VS Code / Cursor ships the same runtime, skills, and tool-use loop — install from the VS Code Marketplace or Open VSX.
Install
Install Ollama and pull a model:
brew install ollama # or download installer ollama pull qwen2.5-coder:7b # fast, tool-calling, ~4.7 GBInstall the CLI globally:
npm i -g @burtson-labs/bandit-stealth-cliRun it:
bandit # interactive REPL bandit "explain @src/auth/login.ts" # one-shot with a file mention
That's it. No API keys. No cloud services. The agent reads your code, searches, runs commands, and writes changes — all locally.
What it does
- Agentic tool use — reads files, searches code, runs commands, writes changes
- Unified-diff approval gate — every
write_file/apply_editshows a colored diff before touching disk - Pre-write validation — TypeScript, Python, JSON, C# syntax-checked before the agent can write
- Post-write validation — JSON edits are re-parsed after write; failure feeds back to the agent on the next turn so it self-corrects without you flagging it
- Skills system — agent activates specialized skills based on your prompt, and can create its own
- Background subagents — long investigations spawn detached; status bar shows
bg:N running; you keep talking; synopsis auto-injects when ready (/tasksto inspect, drill down, or cancel) watch_command— run a dev server /--watchtest runner for a bounded window, agent reacts to what came outfind_directory— cross-repo discovery; ask "open the auth-api repo" and the agent sweeps~/Documents/GitHub,~/GitHub,~/Projects,~/code,~/dev,~/repos,~/work,~/src, plus the workspace parent — no "where is that repo?" round-trips- MCP both directions — speaks the Model Context Protocol as a client (
/mcp add github <token>,/mcp add slack,/mcp add gitlab,/mcp add custom <name> <cmd…>) and as a server (bandit mcp serveexposes Bandit's native tool surface over stdio so Claude Desktop / Cursor / Cline / Continue can drive your codebase through it) - Installs CLIs on demand — ask Bandit to install
ripgrep,httpie, the GitHub CLI, etc. and it picks the right package manager (brew,npm install -g,pip install,cargo install,gem install,go install) and runs it through the permission gate - Interactive scaffolders work —
create-vite,create-react-app,ng new, etc. detect a non-TTY stdin and self-abort. Bandit recognizes the pattern and surfaces a clear "run this with!" recovery hint so the model doesn't loop on a "command appeared to succeed" misread - Live command output —
npm install,pip install,watch_command npm run devstream their output to your terminal as it arrives, dimmed, while the spinner keeps animating. No more wondering if a 20-second install is hung - Interrupt + queue — press Esc mid-turn to cancel the agent and clear your queue. Type a follow-up + Enter to queue it (
queued: N · sends after this turnin the status row). The next turn picks it up automatically ?shortcuts overlay — type?at an empty prompt for a live cheatsheet that disappears the moment you backspace it!-prefix shell escape —!cmdruns straight in your shell with full TTY access. First-use confirmation gate; per-call yellow box every time after so you can't miss the bypass. Catastrophic patterns (rm -rf,mkfs,dd if=) blocked even here- Plan execution — structured multi-step plans for complex refactors
- Session persistence — every REPL session saved as JSONL under
~/.bandit/sessions/for later resume /insightsHTML report — local-only activity report: tool stats, top-touched files, languages, longest streak, peak day, error patterns, optional AI summary, mailto share- Project memory — drop a
BANDIT.mdorCLAUDE.mdat your workspace root and it's auto-loaded into the system prompt - File + image mentions —
@pathauto-inlines files; images are either sent multimodally or OCR'd locally (Apple Vision / tesseract) - Clipboard paste —
Ctrl+Vin the REPL pastes an image straight from your clipboard - Hooks —
PreToolUse/PostToolUse/Stopshell hooks via.bandit/settings.json - 12 themes — Stealth Light/Dark, Midnight, Onyx, Charcoal, Dracula, Nord, Tokyo Night, Solarized Dark/Light, Catppuccin Mocha, Sepia.
/themeto pick - Cross-platform — macOS, Linux, Windows; Windows
.cmd/.batshims (npm/npx/pnpm/tsc) resolved correctly - Update-aware — fire-and-forget npm-registry check at boot;
update vX.Y.Z availableshows in the status bar when a newer CLI is published
Slash commands
Type ? on an empty prompt for the at-a-glance overlay; /help for the full list.
| Command | Does |
|---|---|
| /help | Full slash-command list |
| /login <key> | Save a Bandit Cloud API key to ~/.bandit/config.json (also /login, /login clear) |
| /usage | Bandit Cloud session + weekly usage limits (/usage check for one-line ⚠ flag) |
| /model [name] | Switch model mid-session |
| /ollama [url] | Show or set the Ollama endpoint — /ollama default resets to http://localhost:11434 |
| /think on, /think off, /think auto | Override per-model thinking-mode default |
| /theme [name] | Pick a color palette (/theme lists; saved to global config) |
| /skills | List loaded skills |
| /session list, /session resume <id>, /session new | Manage sessions |
| /memory | Show auto-loaded BANDIT.md / CLAUDE.md |
| /config | Show effective config + path (secrets redacted) |
| /clear | Reset conversation (keeps session id) |
| /compact | Trim old tool results to fit the context window |
| /rewind [id] | Restore a file from a per-edit checkpoint |
| /tasks | List background subagent tasks (/tasks <id> drill-down, /tasks cancel <id>) |
| /plan <goal> | Heuristic plan first, y/N to execute |
| /init | Scaffold BANDIT.md from a repo scan |
| /commit | Draft a conventional-commit message from the staged diff |
| /review [focus] | Code review of staged changes or branch-vs-main, ends with 🟢/🟡/🔴 |
| /refactor <target> | Concrete refactor suggestions with before/after snippets |
| /test <target> | Generate tests in the project's existing framework |
| /explain <target> | Plain-English walkthrough of a file or function |
| /onboard | New-developer setup checklist for the repo |
| /changelog [range] | Release notes drafted from git log |
| /exit | Quit |
Skills
The agent activates specialized skills based on your prompt:
| Skill | Trigger | What it does | |---|---|---| | Filesystem | always | Read, write, search, list, run commands | | Git | always | Status, diff, log, commit | | Code Review | "review my changes" | Diff + full file context | | Testing | "write tests" | Auto-detect runner, generate tests | | Planning | "refactor the auth system" | Structured multi-step decomposition | | Semantic Search | "how is auth implemented" | Local embedding search |
Custom skills (the agent can make its own)
Ask: "create a skill that runs my linter"
The agent writes .bandit/skills/linter.md. Next prompt, it's live. Ask "lint my code" and it runs.
MCP — Model Context Protocol servers
Bandit speaks MCP as a client, so any MCP server you can spawn (filesystem, git, GitHub, Google Drive, Gmail, Slack, Postgres, custom workplace tools…) plugs straight into the same tool-use loop. Each server's tools are namespaced as <server>.<tool> and registered alongside read_file, apply_edit, etc.
Configure at ~/.bandit/mcp-servers.json (global) or .bandit/mcp-servers.json (workspace, takes precedence). Schema is the standard MCP mcpServers shape — the same JSON other MCP clients use, so configs port between them:
{
"mcpServers": {
"fs-tmp": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-filesystem", "/tmp"]
},
"github": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-github"],
"env": { "GITHUB_PERSONAL_ACCESS_TOKEN": "ghp_..." }
}
}
}Manage with the /mcp slash command:
| Command | What it does |
|---|---|
| /mcp | List configured servers + status (connected / idle / error) and tool counts |
| /mcp tools <name> | Spawn the server (lazy) and introspect its exposed tools |
| /mcp connect <name> | Explicit warmup so the first invocation isn't slow |
| /mcp disconnect <name> | Close the server's child process (re-spawns lazily on next use) |
| /mcp reload | Re-read the config files from disk after edits — no restart needed |
Servers spawn lazily on first invocation, persist for the session, and get cleaned up on REPL exit. Failures are isolated — a broken server logs an error and the rest of the loop keeps running on native tools only. Off by default — no config file = zero behavior change.
Recommended models
Pull one with ollama pull <model>. Bandit auto-detects each model's capabilities and takes the native tool-calling path when supported.
| Model | Where | Notes |
|---|---|---|
| bandit-logic (cloud) | Bandit gateway (API key) | Default for cloud. Agent-tuned wrapper around Qwen 3.6 27B with thinking mode. Best reliability on multi-step agent tasks — what we recommend trying first. |
| qwen3.6:27b | Local / Mac 48GB+, high-VRAM GPU (~17 GB) | Best local pick. Same family as bandit-logic, runs offline. Probes the filesystem instead of asking for clarification — real agent behavior. |
| gemma4:26b | Local / Mac 32GB+ (~17 GB) | Solid alternative when Qwen 3.6 is too heavy for your hardware. Multimodal, 128K context. |
| gemma4:e4b | Local / laptop-class (~3 GB) | Lightweight pick that punches above its weight. Validated on real Bandit runs — clean tool sequencing (ls → narrow → read_file), no hallucinated paths. Right pick when you want a local agent that doesn't pin your fans. |
| gemma4:31b | Local / Mac 64GB+, GPU node | Bigger context, better reasoning for complex refactors. |
| qwen2.5-coder:7b | Local / Mac (~4.7 GB) | Fast lightweight pick. Native tool calling. Best for "given context, do X" tasks rather than autonomous discovery. |
| devstral:latest | Local / Mac 32GB+ | Mistral's agent-tuned model — strong tool use. |
| bandit-core-1 (cloud) | Bandit gateway (API key) | Lightweight cloud option. Faster first-token than bandit-logic, less reliable on multi-step agent tasks. |
Models we don't recommend (for agent work)
Bandit is an autonomous agent harness — it expects the model to discover repo structure, plan edits, and emit tool calls without being hand-held. Some otherwise-impressive models aren't trained for that workflow and produce unexpected results:
gpt-oss:120band other reasoning-tuned models — post-trained for OpenAI's harmony tool-call format, not the XML/native protocols Bandit uses. Tends to narrate intent ("I'll search for the controllers...") without ever emitting an actual tool call.qwen2.5-coder:32band other code-completion-tuned models — post-trained for fully-specified code-generation benchmarks. On ambiguous prompts it asks for paths instead of probing. Solid for concrete tasks; underwhelming as an autonomous agent.qwen3.6:35b— the larger Qwen 3.6 variant stalls in reasoning-only output and ignores the harness's "act now" nudges. The 27B is the better production pick from this family.
If you want to test models outside the recommended list, expect the reasoning-only / narrate-but-no-action / partial-completion detectors to fire frequently. Those are signal — they mean the model isn't a great fit for autonomous agent work.
Capability dispatch:
- Native tool calling — Qwen 3.6, Qwen 2.5 Coder, Llama 3.1+, Devstral, DeepSeek-Coder-V2+. Tool schemas go in Ollama's
tools:field. Saves ~1500–3000 tokens per turn. - Text-parsing fallback — Gemma 3/4 and anything else. XML-style tool block lives in the system prompt with the full mitigation stack armed.
Any Ollama model works — capabilities auto-detect via /api/show.
Configuration
Config file (preferred)
~/.bandit/config.json or <workspace>/.bandit/config.json:
{
"provider": "ollama", // or "bandit"
"model": "qwen2.5-coder:7b",
"ollama": {
"url": "http://localhost:11434",
"headers": { "Authorization": "Bearer ..." } // optional
},
"bandit": {
"apiKey": "bnd_...",
"apiUrl": "https://api.burtson.ai"
}
}Workspace config overrides user config. Secrets belong in the user-level file, not in a committed workspace file.
Environment variables
| Var | Default | Description |
|---|---|---|
| BANDIT_PROVIDER | ollama | ollama or bandit |
| BANDIT_MODEL | gemma4:e4b | Model ID |
| BANDIT_API_KEY | — | Required when BANDIT_PROVIDER=bandit |
| BANDIT_API_URL | https://api.burtson.ai | Override Bandit API endpoint |
| OLLAMA_URL | http://localhost:11434 | Ollama endpoint |
| BANDIT_MAX_ITERATIONS | 20 | Tool-use loop cap |
| BANDIT_AUTO_APPROVE | 0 | 1/true to skip write-approval prompts |
| NO_COLOR | — | Disable ANSI colors |
Remote GPU
Running a bigger model on a remote Ollama instance? Point OLLAMA_URL at the remote endpoint and set BANDIT_MODEL to the bigger model. Requests route to the remote node; everything else stays local.
Rented GPU (RunPod / Vast.ai / Lambda)
When you need to run a model your local hardware can't fit, Bandit talks to any remote Ollama endpoint — including rented GPU pods. Same shape on every provider: spin up a pod with Ollama on port 11434, copy the proxy URL, point OLLAMA_URL at it.
RunPod (recommended — simplest UX):
# 1. From the RunPod template gallery, pick any Ollama template.
# H100 SXM is the right pick for 27-32B models; multi-GPU only
# needed for 70B+. Network volume optional but useful if you want
# model weights to persist across pod restarts.
# 2. Once the pod boots, copy its proxy URL from the dashboard.
# Format: https://<pod-id>-11434.proxy.runpod.net
# 3. SSH into the pod and pull a model:
ollama pull qwen3.6:27b
# 4. Locally, point Bandit at it:
export OLLAMA_URL="https://<pod-id>-11434.proxy.runpod.net"
export BANDIT_MODEL="qwen3.6:27b"
banditTear the pod down when you're done. ~$2/hr for an H100 SXM × 15-20 min agent session = under $1.
Vast.ai / Lambda Labs: same pattern. Find an Ollama-preloaded image (or apt install Ollama yourself), expose port 11434, set OLLAMA_URL to the host URL.
Recommended models for rented GPU:
| Model | Size | What it's good at |
|---|---|---|
| qwen3.6:27b | ~17 GB | Same model as bandit-logic. Native tool calling, vision, 256K context. Best general-purpose pick. |
| qwen2.5-coder:32b | ~20 GB | Code-specialist post-train. Strongest on file edits and refactors. |
| qwen3.6:35b | ~24 GB | Bigger Qwen 3.6 variant — slower, marginally better reasoning. |
Avoid for agent work: gpt-oss:120b and similar reasoning-tuned models. They're post-trained for OpenAI's harmony tool-call format, not the XML protocol Bandit uses for non-native models — they tend to narrate intent without emitting tool calls. Great for math/proofs in chat, poor for filesystem agent loops.
Security & privacy
- Local-first by default — with
provider=ollama, nothing leaves your machine. - Approval gate — all file writes show a unified diff before touching disk (unless
BANDIT_AUTO_APPROVE=1). - Command allowlist —
run_commandonly executes from an internal allowlist (git, gh, kubectl, helm, brew, standard *nix tools). Arbitrary shell is refused. - Secret hygiene — API keys are redacted in
/configoutput and never logged. - Local sessions — stored as JSONL under
~/.bandit/sessions/. Inspect at any time.
Requirements
- Node.js 20+
- Ollama running locally (or remote via
OLLAMA_URL) — unless you useBANDIT_PROVIDER=bandit rg(ripgrep) onPATHfor fast code search; falls back togrepif absent
Troubleshooting
Ollama not detected — Make sure it's running: ollama serve. The CLI checks on startup and surfaces a setup hint if it can't connect.
Model not installed — Pull it: ollama pull <model>. Run /model <name> in the REPL to switch without restarting.
Slow responses — Check your model size against available VRAM. Switch to a smaller model from the recommended list.
Stuck approval prompt in CI — Set BANDIT_AUTO_APPROVE=1 to skip the diff-approval gate.
Support
- Issues, feature requests, and questions: [email protected]
- More from Burtson Labs: burtson.ai
Bandit CLI is built by Burtson Labs. Source for the runtime packages is currently private — open source release planned.
