@burtson-labs/bandit-stealth-cli

v1.7.132

Published

42 minutes ago

Bandit — a local-first AI coding agent for your terminal. Same runtime as the Bandit Stealth VS Code / Cursor extension.

Downloads

14,937

0High
0Medium
0Low

markymarkburt

bburtson09

ai agent cli coding-agent llm ollama local-first bandit burtson-labs terminal repl developer-tools

Bandit — Agent CLI

Local-first AI coding agent for your terminal.

Your code never leaves your machine. Works with any Ollama model.

Prefer an IDE? The sibling Bandit Stealth extension for VS Code / Cursor ships the same runtime, skills, and tool-use loop — install from the VS Code Marketplace or Open VSX.

Install

Install Ollama and pull a model:

brew install ollama                       # or download installer
ollama pull qwen2.5-coder:7b              # fast, tool-calling, ~4.7 GB

Install the CLI globally:

npm i -g @burtson-labs/bandit-stealth-cli

Run it:

bandit                                     # interactive REPL
bandit "explain @src/auth/login.ts"        # one-shot with a file mention

That's it. No API keys. No cloud services. The agent reads your code, searches, runs commands, and writes changes — all locally.

What it does

Agentic tool use — reads files, searches code, runs commands, writes changes
Unified-diff approval gate — every write_file / apply_edit shows a colored diff before touching disk
Pre-write validation — TypeScript, Python, JSON, C# syntax-checked before the agent can write
Post-write validation — JSON edits are re-parsed after write; failure feeds back to the agent on the next turn so it self-corrects without you flagging it
Skills system — agent activates specialized skills based on your prompt, and can create its own
Background subagents — long investigations spawn detached; status bar shows bg:N running; you keep talking; synopsis auto-injects when ready (/tasks to inspect, drill down, or cancel)
watch_command — run a dev server / --watch test runner for a bounded window, agent reacts to what came out
find_directory — cross-repo discovery; ask "open the auth-api repo" and the agent sweeps ~/Documents/GitHub, ~/GitHub, ~/Projects, ~/code, ~/dev, ~/repos, ~/work, ~/src, plus the workspace parent — no "where is that repo?" round-trips
MCP both directions — speaks the Model Context Protocol as a client (/mcp add github <token>, /mcp add slack, /mcp add gitlab, /mcp add custom <name> <cmd…>) and as a server (bandit mcp serve exposes Bandit's native tool surface over stdio so Claude Desktop / Cursor / Cline / Continue can drive your codebase through it)
Installs CLIs on demand — ask Bandit to install ripgrep, httpie, the GitHub CLI, etc. and it picks the right package manager (brew, npm install -g, pip install, cargo install, gem install, go install) and runs it through the permission gate
Interactive scaffolders work — create-vite, create-react-app, ng new, etc. detect a non-TTY stdin and self-abort. Bandit recognizes the pattern and surfaces a clear "run this with !" recovery hint so the model doesn't loop on a "command appeared to succeed" misread
Live command output — npm install, pip install, watch_command npm run dev stream their output to your terminal as it arrives, dimmed, while the spinner keeps animating. No more wondering if a 20-second install is hung
Interrupt + queue — press Esc mid-turn to cancel the agent and clear your queue. Type a follow-up + Enter to queue it (queued: N · sends after this turn in the status row). The next turn picks it up automatically
? shortcuts overlay — type ? at an empty prompt for a live cheatsheet that disappears the moment you backspace it
!-prefix shell escape — !cmd runs straight in your shell with full TTY access. First-use confirmation gate; per-call yellow box every time after so you can't miss the bypass. Catastrophic patterns (rm -rf, mkfs, dd if=) blocked even here
Plan execution — structured multi-step plans for complex refactors
Session persistence — every REPL session saved as JSONL under ~/.bandit/sessions/ for later resume
/insights HTML report — local-only activity report: tool stats, top-touched files, languages, longest streak, peak day, error patterns, optional AI summary, mailto share
Project memory — drop a BANDIT.md or CLAUDE.md at your workspace root and it's auto-loaded into the system prompt
File + image mentions — @path auto-inlines files; images are either sent multimodally or OCR'd locally (Apple Vision / tesseract)
Clipboard paste — Ctrl+V in the REPL pastes an image straight from your clipboard
Hooks — PreToolUse / PostToolUse / Stop shell hooks via .bandit/settings.json
12 themes — Stealth Light/Dark, Midnight, Onyx, Charcoal, Dracula, Nord, Tokyo Night, Solarized Dark/Light, Catppuccin Mocha, Sepia. /theme to pick
Cross-platform — macOS, Linux, Windows; Windows .cmd/.bat shims (npm/npx/pnpm/tsc) resolved correctly
Update-aware — fire-and-forget npm-registry check at boot; update vX.Y.Z available shows in the status bar when a newer CLI is published

Slash commands

Type ? on an empty prompt for the at-a-glance overlay; /help for the full list.

| Command | Does | |---|---| | /help | Full slash-command list | | /login <key> | Save a Bandit Cloud API key to ~/.bandit/config.json (also /login, /login clear) | | /usage | Bandit Cloud session + weekly usage limits (/usage check for one-line ⚠ flag) | | /model [name] | Switch model mid-session | | /ollama [url] | Show or set the Ollama endpoint — /ollama default resets to http://localhost:11434 | | /think on, /think off, /think auto | Override per-model thinking-mode default | | /theme [name] | Pick a color palette (/theme lists; saved to global config) | | /skills | List loaded skills | | /session list, /session resume <id>, /session new | Manage sessions | | /memory | Show auto-loaded BANDIT.md / CLAUDE.md | | /config | Show effective config + path (secrets redacted) | | /clear | Reset conversation (keeps session id) | | /compact | Trim old tool results to fit the context window | | /rewind [id] | Restore a file from a per-edit checkpoint | | /tasks | List background subagent tasks (/tasks <id> drill-down, /tasks cancel <id>) | | /plan <goal> | Heuristic plan first, y/N to execute | | /init | Scaffold BANDIT.md from a repo scan | | /commit | Draft a conventional-commit message from the staged diff | | /review [focus] | Code review of staged changes or branch-vs-main, ends with 🟢/🟡/🔴 | | /refactor <target> | Concrete refactor suggestions with before/after snippets | | /test <target> | Generate tests in the project's existing framework | | /explain <target> | Plain-English walkthrough of a file or function | | /onboard | New-developer setup checklist for the repo | | /changelog [range] | Release notes drafted from git log | | /exit | Quit |

Skills

The agent activates specialized skills based on your prompt:

| Skill | Trigger | What it does | |---|---|---| | Filesystem | always | Read, write, search, list, run commands | | Git | always | Status, diff, log, commit | | Code Review | "review my changes" | Diff + full file context | | Testing | "write tests" | Auto-detect runner, generate tests | | Planning | "refactor the auth system" | Structured multi-step decomposition | | Semantic Search | "how is auth implemented" | Local embedding search |

Custom skills (the agent can make its own)

Ask: "create a skill that runs my linter"

The agent writes .bandit/skills/linter.md. Next prompt, it's live. Ask "lint my code" and it runs.

MCP — Model Context Protocol servers

Bandit speaks MCP as a client, so any MCP server you can spawn (filesystem, git, GitHub, Google Drive, Gmail, Slack, Postgres, custom workplace tools…) plugs straight into the same tool-use loop. Each server's tools are namespaced as <server>.<tool> and registered alongside read_file, apply_edit, etc.

Configure at ~/.bandit/mcp-servers.json (global) or .bandit/mcp-servers.json (workspace, takes precedence). Schema is the standard MCP mcpServers shape — the same JSON other MCP clients use, so configs port between them:

{
  "mcpServers": {
    "fs-tmp": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-filesystem", "/tmp"]
    },
    "github": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-github"],
      "env": { "GITHUB_PERSONAL_ACCESS_TOKEN": "ghp_..." }
    }
  }
}

Manage with the /mcp slash command:

| Command | What it does | |---|---| | /mcp | List configured servers + status (connected / idle / error) and tool counts | | /mcp tools <name> | Spawn the server (lazy) and introspect its exposed tools | | /mcp connect <name> | Explicit warmup so the first invocation isn't slow | | /mcp disconnect <name> | Close the server's child process (re-spawns lazily on next use) | | /mcp reload | Re-read the config files from disk after edits — no restart needed |

Servers spawn lazily on first invocation, persist for the session, and get cleaned up on REPL exit. Failures are isolated — a broken server logs an error and the rest of the loop keeps running on native tools only. Off by default — no config file = zero behavior change.

Recommended models

Pull one with ollama pull <model>. Bandit auto-detects each model's capabilities and takes the native tool-calling path when supported.

| Model | Where | Notes | |---|---|---| | bandit-logic (cloud) | Bandit gateway (API key) | Default for cloud. Agent-tuned wrapper around Qwen 3.6 27B with thinking mode. Best reliability on multi-step agent tasks — what we recommend trying first. | | qwen3.6:27b | Local / Mac 48GB+, high-VRAM GPU (~17 GB) | Best local pick. Same family as bandit-logic, runs offline. Probes the filesystem instead of asking for clarification — real agent behavior. | | gemma4:26b | Local / Mac 32GB+ (~17 GB) | Solid alternative when Qwen 3.6 is too heavy for your hardware. Multimodal, 128K context. | | gemma4:e4b | Local / laptop-class (~3 GB) | Lightweight pick that punches above its weight. Validated on real Bandit runs — clean tool sequencing (ls → narrow → read_file), no hallucinated paths. Right pick when you want a local agent that doesn't pin your fans. | | gemma4:31b | Local / Mac 64GB+, GPU node | Bigger context, better reasoning for complex refactors. | | qwen2.5-coder:7b | Local / Mac (~4.7 GB) | Fast lightweight pick. Native tool calling. Best for "given context, do X" tasks rather than autonomous discovery. | | devstral:latest | Local / Mac 32GB+ | Mistral's agent-tuned model — strong tool use. | | bandit-core-1 (cloud) | Bandit gateway (API key) | Lightweight cloud option. Faster first-token than bandit-logic, less reliable on multi-step agent tasks. |

Models we don't recommend (for agent work)

Bandit is an autonomous agent harness — it expects the model to discover repo structure, plan edits, and emit tool calls without being hand-held. Some otherwise-impressive models aren't trained for that workflow and produce unexpected results:

gpt-oss:120b and other reasoning-tuned models — post-trained for OpenAI's harmony tool-call format, not the XML/native protocols Bandit uses. Tends to narrate intent ("I'll search for the controllers...") without ever emitting an actual tool call.
qwen2.5-coder:32b and other code-completion-tuned models — post-trained for fully-specified code-generation benchmarks. On ambiguous prompts it asks for paths instead of probing. Solid for concrete tasks; underwhelming as an autonomous agent.
qwen3.6:35b — the larger Qwen 3.6 variant stalls in reasoning-only output and ignores the harness's "act now" nudges. The 27B is the better production pick from this family.

If you want to test models outside the recommended list, expect the reasoning-only / narrate-but-no-action / partial-completion detectors to fire frequently. Those are signal — they mean the model isn't a great fit for autonomous agent work.

Capability dispatch:

Native tool calling — Qwen 3.6, Qwen 2.5 Coder, Llama 3.1+, Devstral, DeepSeek-Coder-V2+. Tool schemas go in Ollama's tools: field. Saves ~1500–3000 tokens per turn.
Text-parsing fallback — Gemma 3/4 and anything else. XML-style tool block lives in the system prompt with the full mitigation stack armed.

Any Ollama model works — capabilities auto-detect via /api/show.

Configuration

Config file (preferred)

~/.bandit/config.json or <workspace>/.bandit/config.json:

{
  "provider": "ollama",                       // or "bandit"
  "model": "qwen2.5-coder:7b",
  "ollama": {
    "url": "http://localhost:11434",
    "headers": { "Authorization": "Bearer ..." }  // optional
  },
  "bandit": {
    "apiKey": "bnd_...",
    "apiUrl": "https://api.burtson.ai"
  }
}

Workspace config overrides user config. Secrets belong in the user-level file, not in a committed workspace file.

Environment variables

| Var | Default | Description | |---|---|---| | BANDIT_PROVIDER | ollama | ollama or bandit | | BANDIT_MODEL | gemma4:e4b | Model ID | | BANDIT_API_KEY | — | Required when BANDIT_PROVIDER=bandit | | BANDIT_API_URL | https://api.burtson.ai | Override Bandit API endpoint | | OLLAMA_URL | http://localhost:11434 | Ollama endpoint | | BANDIT_MAX_ITERATIONS | 20 | Tool-use loop cap | | BANDIT_AUTO_APPROVE | 0 | 1/true to skip write-approval prompts | | NO_COLOR | — | Disable ANSI colors |

Remote GPU

Running a bigger model on a remote Ollama instance? Point OLLAMA_URL at the remote endpoint and set BANDIT_MODEL to the bigger model. Requests route to the remote node; everything else stays local.

Rented GPU (RunPod / Vast.ai / Lambda)

When you need to run a model your local hardware can't fit, Bandit talks to any remote Ollama endpoint — including rented GPU pods. Same shape on every provider: spin up a pod with Ollama on port 11434, copy the proxy URL, point OLLAMA_URL at it.

RunPod (recommended — simplest UX):

# 1. From the RunPod template gallery, pick any Ollama template.
#    H100 SXM is the right pick for 27-32B models; multi-GPU only
#    needed for 70B+. Network volume optional but useful if you want
#    model weights to persist across pod restarts.

# 2. Once the pod boots, copy its proxy URL from the dashboard.
#    Format: https://<pod-id>-11434.proxy.runpod.net

# 3. SSH into the pod and pull a model:
ollama pull qwen3.6:27b

# 4. Locally, point Bandit at it:
export OLLAMA_URL="https://<pod-id>-11434.proxy.runpod.net"
export BANDIT_MODEL="qwen3.6:27b"
bandit

Tear the pod down when you're done. ~$2/hr for an H100 SXM × 15-20 min agent session = under $1.

Vast.ai / Lambda Labs: same pattern. Find an Ollama-preloaded image (or apt install Ollama yourself), expose port 11434, set OLLAMA_URL to the host URL.

Recommended models for rented GPU:

| Model | Size | What it's good at | |---|---|---| | qwen3.6:27b | ~17 GB | Same model as bandit-logic. Native tool calling, vision, 256K context. Best general-purpose pick. | | qwen2.5-coder:32b | ~20 GB | Code-specialist post-train. Strongest on file edits and refactors. | | qwen3.6:35b | ~24 GB | Bigger Qwen 3.6 variant — slower, marginally better reasoning. |

Avoid for agent work: gpt-oss:120b and similar reasoning-tuned models. They're post-trained for OpenAI's harmony tool-call format, not the XML protocol Bandit uses for non-native models — they tend to narrate intent without emitting tool calls. Great for math/proofs in chat, poor for filesystem agent loops.

Security & privacy

Local-first by default — with provider=ollama, nothing leaves your machine.
Approval gate — all file writes show a unified diff before touching disk (unless BANDIT_AUTO_APPROVE=1).
Command allowlist — run_command only executes from an internal allowlist (git, gh, kubectl, helm, brew, standard *nix tools). Arbitrary shell is refused.
Secret hygiene — API keys are redacted in /config output and never logged.
Local sessions — stored as JSONL under ~/.bandit/sessions/. Inspect at any time.

Requirements

Node.js 20+
Ollama running locally (or remote via OLLAMA_URL) — unless you use BANDIT_PROVIDER=bandit
rg (ripgrep) on PATH for fast code search; falls back to grep if absent

Troubleshooting

Ollama not detected — Make sure it's running: ollama serve. The CLI checks on startup and surfaces a setup hint if it can't connect.

Model not installed — Pull it: ollama pull <model>. Run /model <name> in the REPL to switch without restarting.

Slow responses — Check your model size against available VRAM. Switch to a smaller model from the recommended list.

Stuck approval prompt in CI — Set BANDIT_AUTO_APPROVE=1 to skip the diff-approval gate.

Support

Issues, feature requests, and questions: [email protected]
More from Burtson Labs: burtson.ai

Bandit CLI is built by Burtson Labs. Source for the runtime packages is currently private — open source release planned.