agent.libx.js
v0.94.37
Published
Edge-native AI agent runtime — drives a virtual filesystem via any LLM (ai.libx.js). Same bytes run in node, browser, or edge.
Maintainers
Readme
agent.libx.js
A coding agent that matches Claude Code on correctness — then beats it on cost, tokens, and tool-efficiency, and runs where Claude Code can't (sandbox, browser/edge, database).
By default it's a full-strength terminal coding agent: real disk, real shell, and the same Read/Edit/Grep/permissions/streaming-DX surface you'd expect from Claude Code. The difference is its two host couplings are swappable seams:
- LLM → any model via
ai.libx.js(AIClient.chat, OpenAI-style tools/streaming). - Filesystem → a pluggable
IFilesystem(real disk, in-memory, IndexedDB, a database, hybrid mounts) fromwcli's headless core.
So the same agent loop also runs sandboxed (in-memory VFS, real disk untouched), on the edge / browser (no Node, no /bin/sh), or hybrid (mount real dirs + a database + remote storage side by side, with transactional overlays for checkpoint/rollback).
Claude Code is the floor; running isolated, on the edge, or hybrid is the ceiling.
How it stacks up vs Claude Code
Correctness parity — efficiency, cost, and reach are the lead. Hard 7-task coding suite, Sonnet, denoised (each task ×3, no lucky run promotes; SUITE=hard bun compare/run.ts):
| | agent.libx.js | Claude Code | |---|---|---| | Correctness | 7/7 | 7/7 — parity | | Tool-calls | 16 | 28 — −43% | | Tokens | 69k | 171k — 2.5× fewer | | Wall-time | ~100s | 133s — ~25% faster |
Cost (9-task hard suite, USD-metered, vs CC-on-Opus): $0.49 single-tier Sonnet (5.4× cheaper) · $0.82 three-tier voice/duplex (3.3× cheaper) vs CC-Opus $2.67 — at quality parity (16/18 vs 17/18 passes).
Plus things Claude Code simply doesn't do:
- Runs where CC can't — the same agent loop runs on real disk, an in-memory sandbox, the browser/edge (no Node, no
/bin/sh), or a database-backed workspace. Swap the filesystem, not the agent. - Keyless web search, built in —
WebSearchworks in any deployment with no API key (DuckDuckGo; auto-upgrades to Tavily if you set one). CC's search is Anthropic-server-bound. - Context-safe by default — a 1 MB
Grep/Read/MCP result is auto-paginated and can't blow the window; buried detail is recovered via a cheap context-isolatedAskpeek — ~5.3× cheaper and more accurate than re-fetching, in a head-to-head. - It improves its own efficiency — an autonomous evolution loop cut its own tool-use ~50% (32 → 15 on the core suite, denoised), self-discovered, not hand-tuned — the same lever behind the efficiency lead above.
Honest scope: the win is efficiency / cost / reach, not a claim of smarter reasoning — correctness is parity. All figures are denoised and reproducible (see Eval & compare); full boards in mind/09-outperform.md.
Quickstart
Point it at your project — no clone needed (requires Bun):
export ANTHROPIC_API_KEY=… # or OPENAI_API_KEY / GOOGLE_API_KEY / GROQ_API_KEY
bunx agent.libx.js "find and fix the failing test" # run once in the current directory
bunx agent.libx.js # …or open the interactive REPLWant a permanent command? bun add -g agent.libx.js, then just agentx (and agentx --duplex for voice). The agent has full real-disk + shell access by default (like Claude Code); add --sandbox to work on an in-memory copy instead. See The agentx CLI for flags, sessions, and slash commands.
Use it as a library
import { AIClient } from 'ai.libx.js';
import { Agent, MemFilesystem } from 'agent.libx.js';
const fs = new MemFilesystem(); // or NodeDiskFilesystem(dir) — interchangeable
await fs.createDir('/src');
await fs.writeFile('/src/x.ts', 'export const add = (a,b) => a - b;\n');
const ai = new AIClient({ apiKeys: { anthropic: process.env.ANTHROPIC_API_KEY } });
const res = await new Agent({ ai, fs, model: 'anthropic/claude-sonnet-4-6' })
.run('Fix the add bug in /src/x.ts');
console.log(res.finishReason, await fs.readFile('/src/x.ts'));Tools the agent gets
Shell(CLI disk mode) — a real/bin/sh: run any installed binary (git, bun, node, curl, scripts, …).bash(library / sandbox mode) —ls/cat/grep/find/head/tail/echo/mkdir/rm/mv/wc, pipes, redirects, chaining — over the VFS (wcli's sandboxed JS interpreter).Read— 1-indexed numbered lines,offset/limit.Edit— exact unique-substring replace, with a read-before-edit staleness guard.Grep/Glob/Write/MultiEdit— structured, typed results straight from the VFS (nobashparsing). The selectable tool set the self-evolution loop mutates over.TodoWrite— a planning scratchpad;Task— spawn a depth-limited child agent over the VFS (subagents: true);SlashCommand— reusable prompt templates from<dir>/*.md(commandsDir); plus a real MCP client (src/mcp.client.ts, node-only — stdio/HTTP JSON-RPC handshake + discovery) that feeds the edge-safe MCP adapter (mcpToolsToAgentTools), so any MCP server's tools become agent tools.WebFetch/WebSearch— fetch a URL as readable text, or search the web. Keyless by default (WebSearch uses DuckDuckGo; auto-upgrades to Tavily whenTAVILY_API_KEYis set) and auto-enabled in the CLI. Factory-built with an injectablefetch, so they stay edge-portable and testable. (In the library they're opt-in by name:tools: [...,'WebSearch'].)- Oversized-output pagination — any tool result over a byte ceiling (
maxToolResultBytes, default 60k) is cropped to page 1 with a marker (refine the query / read further), so one bigGrep/Read/MCP/web result can't blow the context window. In the CLI (on by default;--no-scratchto disable) the full output instead spills losslessly to a scratch file and the model recovers specifics viaGrep/ReadorAsk— a cheap, context-isolated peek that returns just the answer (the raw blob never re-enters context).
Agentic subsystems
Beyond file tools, the runtime ships the higher-altitude pieces too — each an AgentOptions/loop extension over the two seams (see mind/06):
- Skills + memory — VFS-backed (
skillsDir/memoryDir); persistence is just the backend choice. - Subagents (
subagents; typed agents viaagentsDir—<dir>/<name>.mddefines a persona + model + scoped tools, selected with theTaskagentType), hooks (hooks: preToolUse/postToolUse/onStop — block or audit any tool call), slash-commands (commandsDir), TodoWrite, MCP (mcpToolsToAgentTools). - Streaming (
stream: true→text_deltaviaHostBridge.notify) and context compaction (compaction: { maxMessages }→ edge-safe summarize-and-boundary). Defaults preserve the original non-stream, drop-oldest behavior. - Multi-turn + project context —
Agent.send()continues a conversation across turns (vsrun(), which starts fresh); project instructions (instructionFiles:AGENTS.md/CLAUDE.mdat the FS root) inject into the system prompt. - DuplexAgent (
src/duplex.ts) — voice-optimized three-tier engine (reflex/act/think): a fast reflex agent streams instant replies and self-selects escalation —Actfor standard tool work (Sonnet-class),Thinkfor deep reasoning (Opus-class, configurable, default on). Results are pushed back and re-voiced by the reflex (turn mutex, coalesced completions,TaskStatus/CancelTask). Seemind/10. - Scheduler (
src/scheduler.ts+cli/osScheduler.ts) — one-off ({at}), interval ({everyMs}), cron ({cron}) viaScheduleTask/ScheduleList/ScheduleCancel/Wakeup. In-session jobs fire while the session is alive (persisted, re-armed on--resume); far one-offs (orbackend:'os') register with the OS scheduler (launchd / crontab / at) and survive quitting — the fired job headless-resumes the session (agentx -p … --resume <id> --yes). ThePushNotificationtool (osascript / notify-send) alerts the user out-of-band;Readon a.pdfreturns extracted text (poppler's pdftotext, disk mode).RemoteTriggerinvokes another agentx session on this machine: a session open in a live terminal receives the prompt as an injected turn (per-session unix socket, same-user only); otherwise it's resumed headless and the final answer comes back. Seemind/12. - Budget kill-switches — always-on per-run guards (
maxTokens/timeoutMs/maxRepeats/maxToolCalls/signal→finishReasonbudget/timeout/loop/max_tool_calls/aborted) protect the API spend against runaway loops. The enforceable billing cap is server-side in the web key-proxy: a VFS-backed budget config (/.agent/budget.json, USD-metered, hot-reloaded, $100/wk default) a browser client can't bypass. Seeweb/andmind/06.
The agentx CLI
A dependency-light readline REPL (plus headless -p mode) over the runtime:
agentx # interactive REPL in the current dir
agentx "fix the bug in x" # run once and exit
agentx -c "keep going" # continue the most recent session
agentx --resume <id> "…" # resume a specific session- Filesystem + Shell — by default the CLI has full real-filesystem access like Claude Code (root
/is the machine root, the launch dir is the working dir, absolute host paths and above-cwd reach both work) with a real/bin/sh(Shelltool) so the agent can run git, bun, node, curl, and any installed binary. Secrets (.env,.ssh, keys,.git) stay hidden by the jail; env secrets are scrubbed from the child shell.--sandboxinstead operates over an in-memory copy of the working dir with a VFS-onlybash— the real disk is never touched.--boddb <dir>runs over a persistent database workspace (a bod-db store at<dir>—meta.dbtree +files/bytes) that survives across runs while the real disk stays untouched; DB-native by default, or add--seedto hydrate it from cwd on the first run.--no-shellforces the VFS bash in disk mode.--hardenOS-sandboxes the real shell (macOSsandbox-exec/ Linuxbwrap): writes confined to cwd+tmp, outbound network blocked (--harden-netkeeps network); commands fail closed when no wrapper exists. (/sandboxshows the active mode.) - Sessions — every conversation persists to
./.agent/sessions/<id>.json, flushed at every tool step (a crash, hang, or Ctrl-C mid-turn loses at most the in-flight step, never the transcript);--continue/--resume(and/sessions,/resume) pick it back up, with memory across turns — a REPL turn sees the previous one. A global symlink index at~/.agent/sessions/enables cross-project lookup:--resume 090715-myprojectresolves from any directory, and/sessions alllists every project's sessions in one picker. - Diffs — every
Edit/Write/MultiEditrenders a colorized+/-diff (TTY-gated; plain when piped). - Slash commands —
/help /tools /model /compact /copy /diff /memory /clear /sessions /resume /commands /init;/compact <focus>preserves matching lines from the folded span;/copy [code]puts the last reply (or its last code block) on the OS clipboard;/diffshows everything the session changed (oldest checkpoint → now);/memoryopens the memory index in$EDITOR; user-defined./.agent/commands/<name>.mdare invokable directly as/<name>(the same registry the model'sSlashCommandtool uses). Skills/commands created mid-session are picked up automatically each turn (delivered as a cache-friendly<system-reminder>delta, like Claude Code) and theSkill/SlashCommandtools rescan on a name miss;/reloadforces a full catalog + system-prompt rebuild. - Live chrome — the thinking spinner shows elapsed seconds +
esc to interrupt; the terminal tab title tracks the session topic; a bell rings when a long (>10s) turn finishes in a backgrounded tab; the footer warns at 80%/90% context pressure and auto-trims announce themselves. /transcript [n]— the full session transcript including complete tool-result bodies (the past-turn equivalent of Ctrl-O live verbose), paged throughless;/doctor— one-shot environment sanity check (keys, model pricing, config, session-store writability, memory, MCP mounts).- Syntax-highlighted code fences —
```ts(and js/py/sh/go/rust/…) blocks render with keywords bold, strings green, numbers cyan, comments dim; unknown languages keep the plain cyan body. TodoWrite plans pin a compact☑ 2/5 · current stepline into the idle footer. /agents— list subagent types from./.agent/agents(description, model, tool scope);/agents new <name>scaffolds a frontmatter'd definition for theTasktool'sagentType.!<partial>+ menu completes from past!shell commands.@server:urimentions inline an MCP resource body into the prompt. Transient network drops mid-step retry automatically (2 attempts, backoff) instead of failing the turn.- Project instructions —
./AGENTS.md(orCLAUDE.md) auto-loads into every run;/initscaffolds one. - Any provider — set
ANTHROPIC_API_KEY/OPENAI_API_KEY/GOOGLE_API_KEY/GROQ_API_KEY; choose with-m provider/model. - @-file mentions & headless JSON — reference files inline in a prompt with
@path(e.g.explain @src/Agent.ts;~/expands to the home directory; quote paths with spaces as@"…"— drag-dropped files, e.g. macOS screenshots, quote themselves automatically); script with-p --output-format jsonto get one machine-readable result object on stdout (activity stays on stderr). - Tab-completion —
Tabcompletes/<command>names and@<path>file/dir references (descends subdirs, dotfiles hidden unless typed) straight from the working tree. - Duplex mode —
agentx --duplexruns the full standard REPL (slash commands, sessions, postures, rewind, MCP) with the three-tier engine driving turns: a fast voice model (--voice-model, defaultgroq/openai/gpt-oss-120b) answers every line instantly and delegates real work to background workers built with the same wiring as a normal run (fs mode, permissions, MCP); worker activity shows as dim chrome and results are re-voiced when ready. Switch any tier live with/model(opens a reflex/act/think picker), or the/voice-model·/think-modelshortcuts./taskslists background tasks, inspects a task's live output tail, and cancels a running one from a picker (Esc mid-turn cancels the foreground turn; Esc again at the idle prompt cancels running workers). - MCP servers — declare
mcpServers: { name: { command, args } | { url } }in config and they're auto-mounted at startup (in parallel, with an optionalmountTimeoutMsdeadline so one slow/dead server never blocks the rest): the client does the JSON-RPC handshake (stdio or HTTP) +tools/list, and the discovered tools appear asmcp__<name>__<tool>in/tools(inspect with/mcp). A bad server is logged and skipped, never blocking the agent. For large tool sets, deferred mode (makeMcpToolSearch/mountMcpDeferred) exposes just two bounded tools (ToolSearch+McpCall) instead of N defs — dodging the provider tool-cap and improving selection accuracy; the CLI applies this automatically past 12 mounted tools (a 42-tool server was costing ~80k tok/turn in schema alone), and permission rules written against the realmcp__<name>__<tool>names still match throughMcpCall.mountMcpCataloggoes further: a cached, hash-keyed catalog + lazy connect means a turn that uses no MCP tool opens zero connections, and one that uses a tool connects exactly that server — latency scales with tools-used, not servers-configured. A down server is negative-cached (failureCooldownMs) so it never re-floors a later turn at the deadline. For zero turn-path latency even on a cold process, callwarmMcpCatalogat boot + on a timer (off-turn discovery) and mount with{ discover: 'cache-only' }— the turn then never synchronously connects: it serves the warmed catalog and discovers any miss in the background.
🧬 It improves itself
The agent is a coding agent that operates over a swappable filesystem — so it can be pointed at its own repo and evolve its own configuration. evolve/ is an autonomous loop:
champion → propose patch → jailed + sandboxed eval → per-task no-regression gate → ledger → repeatAn LLM is the mutation operator; a behavioral fitness function (run the produced code) is natural selection. Correctness is a hard gate, the rule files are hash-pinned (the agent can't edit what judges it), and every candidate runs under two containment boundaries — a JailedFilesystem (secret denylist, symlink-escape defense) and a sandboxed grader (scrubbed env, nonce-authenticated result, default-on sandbox-exec). Those guardrails were hardened against a 22-agent adversarial red-team (14 findings fixed) before the loop was allowed to run.
Result (Sonnet 4.6): the loop autonomously drove baseline 32 → 15 tool-calls (53% fewer), 5/5 pass held — parity with Claude Code (head-to-head 15 vs 15 tools, 1.8× faster, 2.8× fewer tokens), the efficiency gap we'd only described before. This is the denoised figure (each candidate averaged over 3 runs so no lucky run promotes); a single un-averaged run reached 14. It generalizes to held-out tasks (24 → 12, no overfit) and discovered the human-authored parity plan on its own: use structured Grep/MultiEdit, stop over-exploring.
GENERATIONS=8 bun evolve/loop.ts # evolve → evolve/champion.json + ledger.jsonl
bun evolve/report.ts # instant replay of the arc (no tokens)
EVOLVED=1 bun compare/run.ts # evolved champion vs Claude Code
bun evolve/generalize.ts # baseline vs champion on UNSEEN tasksFull design + threat model + results: mind/08-self-evolve.md.
Status
v1 (done): loop + hybrid tools + Mem/Disk backends + deterministic FakeAIClient tests + real-model run. 5/5 pass@1 on the behavioral eval (Sonnet 4.6); the head-to-head started at correctness parity with Claude Code but ~2× the tool calls (≈28 vs 15) — a gap the self-evolution loop has now closed autonomously: it drove its own baseline from 32 → 15 tool-calls (denoised over 3 runs) and ties Claude Code in a fresh head-to-head (15 vs 15). 820+ tests green.
See mind/ for the full vision, architecture, decision journal, roadmap, eval + head-to-head results, the parity plan, and the self-evolution design.
Develop & evaluate
Hacking on the runtime itself (from a clone):
bun install # links wcli (file:), ai.libx.js + libx.js (bun link)
bun test # 820+ unit/integration tests (offline via FakeAIClient, no key)
ANTHROPIC_API_KEY=… bun examples/run-sonnet.ts # drive a real model end-to-endEval & head-to-head (real model):
bun eval/run.ts # behavioral scorecard (our agent over MemFilesystem)
bun compare/seed-tasks.ts # materialize task specs into .tmp/tasks/
bun compare/run.ts # head-to-head vs Claude Code (needs the `claude` CLI)