aiyu-multi-agent
v2.7.3
Published
Production-grade AI Agent Platform — 84 Agents, 46 Skills, 78 Workflows, 14 Tools, Smart Init, Plugin System, Agent Testing, Publishing, Security Headers, Metrics Auth
Readme
Aiyu MultiAgent is an open-source AI agent platform that helps developers automate software engineering tasks using large language models (LLMs). It features a ReAct execution engine, MCP server integration for Claude Code / Cursor / Windsurf, WebSocket real-time streaming, agent handoff orchestration, and a plugin system for extensible AI capabilities. Supports OpenAI GPT-4, Anthropic Claude, Ollama local models, and mock mode for testing.
Latest Release: v2.7.3 — React Strict Mode WebSocket fix (deferred close pattern), dashboard UI upgrade (unified ChatPanel, sidebar tabs, compact controls, responsive mobile layout), markdown rendering across all panels, keyboard shortcut conflicts resolved, duplicate key fix, dropdown overflow fixes, and provider filtering. V2.7.2 had mock provider default + 56 bug fixes. V2.7.1 had dashboard security hardening. V2.7.0 added real-time monitoring dashboard. All changes backward compatible.
📑 Table of Contents
- What's New in V2.7
- What's New in V2.6
- What's New in V2.5
- Quick Start
- Why Aiyu MultiAgent?
- CLI Reference
- LLM Providers
- Built-in Tools
- Project Structure
- How to Use
- Security & Guardrails
- Agent Testing
- Plugin System
- Contributing
- License
✨ What's New in V2.7
V2.7 brings a real-time monitoring dashboard and bug fix hardening — adding a Next.js 14 dashboard for live agent monitoring, formal WS event schema, and fixing 21 bugs across two releases.
| Area | Change | Impact |
|------|--------|--------|
| 📊 Dashboard | Next.js 14 real-time monitoring (aiyu-multi-agent-dashboard/) | Observability ⬆️ |
| 🌙 P3 Polish | Dark mode, export trace, keyboard shortcuts, WS auto-reconnect, mobile | UX ⬆️ |
| 📡 WS Schema | docs/WS-SCHEMA.md — formal contract (6 client→server, 10 server→client) | Reliability ⬆️ |
| 🔄 Broadcasts | agent.status, handoff.started/complete, delegate.started/complete | Dashboard ⬆️ |
| 🐛 V2.7.0 Fixes | 11 bugs — path traversal, timer leak, Ollama deprioritize, etc. | Security ⬆️ |
| 🐛 V2.7.1 Fixes | 75 bugs across 5 rounds — Dashboard integration, WS disconnect abort, mutable Map export, server crash guard, input validation, path traversal in packager, npm --ignore-scripts, agent file size limit, env secret leak, true LRU, retry jitter, recursive secret scan, symlink warning, heading truncation, watch timer cleanup, .tmp file cleanup, health-check agent reuse, dev --provider flag, dynamic compliance agent, SKIP_DIRS | Stability ⬆️ |
| 🐛 V2.7.2 Fixes | Mock provider default + Full System + Core Logic + Dashboard/API Bug Audit (56 bugs) — var→let leak, circuit-breaker stale cleanup, Ollama probe guard, chat timeout leak, failover loop-index, Claude model passthrough, queue settled flag, handoff regex, cache mutable fallback, halfOpenMaxAttempts 1→3, context trim pair fix + original task preservation, sliding window pair-preservation, isRetryableError word-boundary, Ollama model passthrough, WS reconnect re-subscribe, pendingInterventions cleanup, /health auth bypass, agent duration live refresh, chat-session context trim pair-preservation, chat error turnId, chat-panel auto-select session, completedRuns eviction, handoff artifact count, status complete→completed standardization, endTrace status check, plan pathTraversal sanitization, MCP timeout timer leak, metrics-panel AbortController, delegate context trim pair-preservation | Stability ⬆️ |
| � V2.7.3 Fixes | React Strict Mode WS disconnect, markdown rendering, keyboard shortcut conflicts, duplicate React keys, dropdown overflow, provider filtering, unified ChatPanel, responsive mobile layout | UX ⬆️ |
| � Dashboard Security | CSP headers, API proxy path whitelist, WS auth Sec-WebSocket-Protocol, input validation | Security ⬆️ |
| 🧪 Test Suite | Jest + React Testing Library + Playwright E2E (29 unit + 9 E2E tests) | Quality ⬆️ |
| 🏗️ Refactoring | page.tsx 479→160 lines, DashboardHeader/RunPanel/ResetDialog components | Maintainability ⬆️ |
V2.7.3 Dashboard UI Upgrade + Responsive Design
v2.7.3 polishes the dashboard ChatPanel with a unified layout, responsive mobile support, and fixes for React Strict Mode WebSocket disconnects.
Dashboard UI Polish:
- Unified ChatPanel — Run mode and Chat mode merged into a single full-width ChatPanel with internal sidebar for sessions and monitor
- Sidebar tabs — Underline-style tabs (Chat=blue, Monitor=cyan) replacing pill toggle
- Compact header — Pill-style AgentSelect/ProviderSelect with token usage badge and streaming indicator in one row
- Session cards — Rounded cards with avatar icon, active ring, provider badge, and color-coded action buttons
- New Chat dialog — Gradient icon header, stable open/close behavior (fixed conflicting outside-click listener)
Responsive Design:
- Mobile sidebar — Hidden on mobile (<1024px) with floating toggle button; slides in as overlay with dark backdrop
- Responsive header — Icon-only buttons on small screens, flex-wrap selectors, smaller title
- Compact monitor panels — Tighter spacing and smaller fonts in MetricsPanel and AgentStatusPanel for sidebar fit
Bug Fixes:
- React Strict Mode WS disconnect — Deferred close pattern: cleanup sets 100ms timer instead of immediate close; remount cancels timer
- Markdown rendering — All panels (ExecutionTimeline, AgentStatus, Logs) now use
MarkdownRendererfor bold, lists, code blocks - Keyboard shortcut conflicts — Global
Enterhandler restricted toCtrl/Cmd+Enter; plainEnteronly sends chat - Duplicate React keys —
turnKeynow includestimestampto ensure uniqueness across Fast Refresh remounts - Dropdown overflow — Removed
overflow-hiddenfrom containers, raised dropdown z-index to prevent clipping - Provider filtering — Filter now includes
"configured"and"ok"statuses; defaults to["mock"]before health responds
V2.7.2 Bug Fixes (Mock Provider Default + Full System + Core Logic Bug Audit)
v2.7.2 makes mock the default provider when no API keys are configured, eliminating the "No LLM provider detected" error. A warning is shown so users know they're in mock mode. Also fixes failover chain and health-check to accept mock without requiring AIYU_ENABLE_MOCK.
Full System Bug Audit (10 bugs fixed):
- Core Engine —
chat-session.js: aborted message no longer overwritten by timeout message; empty-stringfinalContentno longer treated as error in trace results - API / WS —
ws.js:provider ??andmaxSteps ??nullish coalescing prevents0/""from incorrectly falling back to defaults;handleChatSendcatch block now clears timeout to prevent timer leak - CLI / Jobs —
jobs.js:provider ??andmax_steps ??nullish coalescing fixes - Handoff —
handoff.js:provider ??nullish coalescing in source/target agent calls - Metrics —
server.js: percentile index calculation fixed fromfloor(n * p)tofloor((n-1) * p)for correct 0-indexed array percentiles - Output —
react-loop.js: JSON output enforcement now triggers on empty-string outputs (!= nullinstead of truthiness check) - UX Audit —
ux_audit.py:has_formregex no longer falsely matchesglass-cardCSS class, eliminating 5 false-positive "form inputs without labels" issues
Core Logic Bug Audit (15 bugs fixed — 4C + 5H + 6M):
- Critical —
var→letinreact-loop.jsandchat-session.js: function-scopedvar resultleaked tool results across iterations - Critical —
circuit-breaker.js:cleanupStaleBreakersdeleted recovered breakers;lastFailureTimenow cleared on HALF_OPEN→CLOSED - Critical —
health-check.js: Ollama probe ran withoutOLLAMA_HOST, causing 2s timeout delay on every health check - Critical —
chat-session.js:chatTimeoutIdleaked on normal ReAct loop exit; addedclearTimeout - High —
failover.js:indexOfinside loop → loop-index based last-provider check - High —
llm-providers.js:callClaudenow receives resolved model likecallOpenAI - High —
request-queue.js:settledflag guard prevents job timeout/completion race - High —
health-check.js: queue error message now includes actual error - High —
handoff.js: sentence-boundary regex replaces naiveincludes("decided") - Medium —
cache.js: mutable shallow copy replacesObject.freezefallback (was breaking_fromCachemutation) - Medium —
circuit-breaker.js:halfOpenMaxAttemptsdefault 1→3 for resilient recovery - Medium —
failover.js:_ollamaLastOkshared state documented for single-process assumption - Medium —
react-loop.js: context trimming now drops assistant + ALL consecutive user messages (was only 2, leaving orphaned tool results) - Medium —
chat-session.js: sliding window pair-preservation — if window starts withuser, include precedingassistant - Medium —
llm-providers.js:isRetryableErroruses word-boundary regex (\b429\b) instead ofincludes("429")
Dashboard + API Bug Audit Round 4 (9 bugs fixed — 1C + 2H + 6M):
- Critical —
react-loop.js: context trimming droppedmessages[1](original user task) instead of starting frommessages[2], causing LLM to receive invalidsystem→assistantsequence and lose task context. Now preserves system + original user message - High —
ws.js:pendingInterventionsMap entry never cleaned when run completes/errors, causing unbounded memory growth. AddedPENDING_INTERVENTIONS.delete(runId)on run completion and error - High —
use-websocket.ts: no re-subscription to active runs after WS reconnect, causing missed steps/completions. Added subscribe messages for active runIds onws.onopen - Medium —
llm-providers.js:callOllama(messages, options)didn't receive resolved model from failover chain (unlikecallOpenAI/callClaude). Changed tocallOllama(messages, { ...options, model }) - Medium —
handoff.js: decision regex required sentence boundary (^|\n|.\s) before keywords, missing mid-sentence patterns like "Based on this, I decided". Added.,;!?as boundary characters - Medium —
server.js:apiKeyAuthmiddleware ran on/healthendpoint, breaking K8s/load balancer probes when API key is configured. Added path check to skip auth for/health - Medium —
agent-status-panel.tsx:formatSinceduration computed once and never refreshed for running agents. Added 1-second interval tick when any agent is in running state - Medium —
dashboard-header.tsx: API Key row showed••••••••implying the key was hidden, butNEXT_PUBLIC_*vars are in page source. Changed to show "Configured"/"Not set" for honest labeling - Medium —
tracing.js: p95 percentile index usedMath.floorcausing low bias for small samples (e.g., length=2 → index=0 instead of 1). Changed toMath.round
Deep System Bug Audit Round 5 (7 bugs fixed — 2H + 5M):
- High —
chat-session.js: context trim after tool execution had no pair-preservation — could start with orphanedusertool-result, producing invalidsystem→user(tool result)LLM sequence. Added pair-preservation logic - High —
ws.js:handleChatSenderror event missingturnId— dashboard couldn't map error to streaming turn, leaving message stuck as "streaming" forever. AddedturnId: resolvedTurnId - Medium —
chat-panel.tsx:useEffectauto-selected session only when!activeSessionId— creating new session while another was active wouldn't switch to it. Fixed condition - Medium —
use-websocket.ts:completedRunsadded entries without immediate eviction — memory bloat between cleanup timer intervals. Added eviction when overMAX_RUNS - Medium —
handoff.js:broadcastHandoffCompletepassedbundle.artifacts(array) instead of count — dashboard displayed[object Object]. Changed to.length - Medium —
react-loop.js+ 4 files:state.status = "complete"didn't match"completed"in WS/API/types.ts, breaking downstream checks. Standardized to"completed" - Medium —
react-loop.js:endTracecheckedstate.status === "complete"after rename, recording successful runs as"error". Updated to"completed"
Deep System Bug Audit Round 6 (4 bugs fixed — 1H + 3M):
- High —
tool-definitions.js:plan.updateandplan.listusedplanIddirectly in path without sanitization —../in planId allowed directory traversal. Added sanitization +pathTraversalguard - Medium —
mcp/server.js: MCPrun_agenttimeout timer never cleared afterPromise.race— 2-minute timer leaked after agent completed. AddedclearTimeout - Medium —
metrics-panel.tsx:fetch("/api/metrics")had noAbortController— in-flight requests continued after unmount, callingsetStateon unmounted component. Added abort on cleanup - Medium —
tool-definitions.js: delegate context trimspliceassumedmessages[1]was alwaysassistant— could throwTypeErrorafter prior splices. Added optional chaining + pair-preservation
V2.7.1 Bug Fixes (10 Critical + 17 High + 19 Medium + 7 Low + 3 CI)
Dashboard Security Hardening (Post-release):
- CSP Headers:
Content-Security-Policy,X-Frame-Options: DENY,X-Content-Type-Options: nosniff,Referrer-Policyvianext.config.js - API Proxy Whitelist:
isPathAllowed()blocks non-whitelisted paths (admin/*,secrets/*, etc.) with 403 - WS Auth: Client sends token via
Sec-WebSocket-Protocol: aiyu-token.<key>instead of query param; serverhandleProtocolsselects subprotocol - Input Validation:
validateInput()/validateIdentifier()inuse-websocket.tsguardssendRun,sendIntervene,sendChatCreate,sendChatSend - Component Refactoring:
page.tsx479→160 lines — extractedDashboardHeader,RunPanel,ResetDialogcomponents - Test Suite: Jest + React Testing Library (29 tests, 4 suites) + Playwright E2E (9 specs) — API proxy, input validation, component rendering
Dashboard Integration (2 Critical + 3 High + 2 Medium):
- Critical: WS client no API key token (auto
?token=injection),sensitiveRouteAuthblocks Docker network (server-side API proxy with auth) - High: Docker port mapping
3001:3000→3001:3001,NEXT_PUBLIC_WS_URLbuild-time embedding, Next.js rewrite no auth forwarding (API route proxy) - Medium: Dashboard missing
sendChatCreate/sendChatSend,/agents/statusesmissing ISOtimestampfield
Server-side (8 Critical + 14 High + 17 Medium + 5 Low + 3 CI):
- Critical: WS run/chat timeout doesn't cancel agent execution (AbortController), agent-loader
isValidAgentNamepath traversal, WS timeout timer leak, agent.delegate missing_runId, failover.filter()mutation, handoff catchReferenceError, WS disconnect doesn't cancel running agent (activeRuns Map), PENDING_INTERVENTIONS mutable Map export (read-only snapshot) - High:
wsApiKeyAuthcrashes on malformed URL, prompt-builder hardcodes tool list, usage.js atomic write, search-tools async, chat-session intervention,AIYU_ENABLE_MOCKnot set in tests, context trim pair mismatch, chat tool timeout/abort, Claude Content-Length, intervene WS fallback, chat lastActivity timing,/agents/statusescrash on ws require fail, jobs.js no input length validation, packager bin/run.js path traversal, plugin.js--ignore-scripts, agent-loader no file size limit (200KB) - Medium: Handoff WS broadcast fallback, Claude system message merge, request-queue
timeout=0, chat context limit, circuit breaker leak, tracing idle leak, cache key collision, tracing recursion → setImmediate, queue job deletion, safeWrite temp cleanup, grep early match limit, _broadcast error handling, SKILL.md size limit, Ollama health check, sandboxExec env secret leak, cache not true LRU (lastAccess), retry no jitter, validator secret scan too narrow (recursive), config symlink fallback no warning, prompt-builder heading overflow, test.js watch timer no unref - Low: ws.js Map accessor functions, health-check GET for Ollama, config.json try/catch, chatSessions read-only, memory pathTraversal, callMock UTF-8 slice, usage.js stale .tmp file, health-check agent GC pressure, dev hardcodes mock (
--providerflag), compliance hardcodes agent name, search-tools SKIP_DIRS
✨ What's New in V2.6
V2.6 brings module decomposition and reliability hardening — breaking the two largest god modules into focused, maintainable files while preserving full backward compatibility.
| Area | Change | Impact |
|------|--------|--------|
| 🏗️ Decomposition | agent-runtime.js (843 lines) → 8 modules | Maintainability ⬆️ |
| 🏗️ Decomposition | tool-registry.js (543 lines) → 3 modules | Maintainability ⬆️ |
| 🔧 Production | Tracing appendFileSync → async batched queue | No event loop blocking |
| 🔧 Production | MCP run_agent 2min timeout + maxSteps cap 20 | Prevents runaway agents |
| 🔧 Production | Usage flush beforeExit + sync fallback | No data loss on exit |
| 🐳 Docker | Non-root user + expanded .dockerignore | Security ⬆️ |
| 🛠️ CLI | aiyu-multi-agent dev REPL with verbose logging | Dev experience ⬆️ |
| 📦 Types | types.d.ts for 12 core modules | TS migration foundation |
| 🧠 Karpathy | Behavioral principles in system prompt + runtime guardrails | LLM coding quality ⬆️ |
| 🤖 Agent Audit | 84/84 clean-code, 84/84 Interaction Maps, frontend decomposed | Agent consistency ⬆️ |
| 🛠️ 7 New Tools | agent.delegate, memory.save/load, web.search, plan.create/update/list | Agent capability ⬆️ |
| 📋 Frontmatter Audit | 84/84 When to Activate, 84/84 Philosophy, 84/84 memory field | Agent completeness ⬆️ |
Decomposed Modules
agent-runtime.js (re-export) ──► react-loop.js — ReAct loop
► chat-session.js — Interactive chat
► failover.js — Per-provider CB
► cache.js — LRU cache
► agent-loader.js — Agent spec loading
► prompt-builder.js — System prompt
► input-sanitizer.js — Input validation
► tool-parser.js — Tool call parsing
tool-registry.js (re-export) ──► tool-definitions.js — Tools + schemas
► search-tools.js — Grep + Glob
► command-parser.js — Shell arg parse✨ What's New in V2.5
V2.5 brings Claude Design-inspired capabilities to the Aiyu MultiAgent platform, enabling real-time agent collaboration, external API access, and smarter project-aware AI automation. This release adds 9 major features and fixes 31 bugs for improved reliability.
V2.5.1 adds 25 system-audit bug fixes (6C+7H+12M):
- Per-provider circuit breaker keys (
llm:openai,llm:claude) withcallLLMWithFailover() - Rate limit hard cap (200 entries) + X-Forwarded-For spoofing fix (
AIYU_TRUST_PROXY) search.greplastIndex reset, chat session failover + 30-min TTL- Handoff bundle persistence + project-scoped path, cache freeze-on-fallback
- LLM retry off-by-one fix, Ollama https transport, usage flush on exit
- CORS origin config (
AIYU_CORS_ORIGIN), fs.glob brace alternation escape
| 🎛️ Real-Time Streaming | 🔗 Agent Handoff | 💬 Inline Intervention |
|:---:|:---:|:---:|
| WebSocket API at /ws streams agent step events live to your IDE | POST /handoff chains multiple AI agents with enriched context bundles | POST /agents/intervene or WebSocket lets you inject feedback mid-run |
| 🌐 fetch.url Tool | 🤖 Auto-Apply Context | 🔐 API Key Auth |
|:---:|:---:|:---:|
| Agents fetch HTTP(S) URLs with 15s timeout and 100KB body limit | Auto-detects language/framework from package.json + .windsurf/rules | AIYU_API_KEY env var with Bearer token + crypto.timingSafeEqual |
| ⚡ LLM Failover | ⏱️ Per-Tool Timeout | ✂️ Smart Truncation |
|:---:|:---:|:---:|
| openai → claude → local → mock failover when circuit breaker opens | 30s Promise.race per tool call with tool_timeout tracing | Section-aware 8KB skill limit preserving headings and code blocks |
🚀 Quick Start
Get started in seconds with npx — no installation required:
# Initialize in your project (one command, smart defaults, no prompts!)
npx aiyu-multi-agent init
# Or use interactive mode for full guided setup
npx aiyu-multi-agent init --interactiveOnce initialized, type any slash command in the Windsurf chat panel or terminal to activate specialized AI agents:
/create Build a task management app with Next.js
/backend Design a REST API with PostgreSQL and authentication
/security Audit my codebase for OWASP vulnerabilities
/debug Find the memory leak in my Express middlewareThe platform automatically detects your project type, selects the right agent, and starts working.
git clone https://github.com/teeprakorn1/aiyu-multi-agent.git
cd aiyu-multi-agent
npm install
aiyu-multi-agent .� Why Aiyu MultiAgent?
The Problem
Developers waste hours on repetitive tasks — setting up projects, writing boilerplate, auditing code, debugging, and orchestrating complex multi-step workflows across different tools and LLM providers.
The Solution
Aiyu MultiAgent is a unified AI agent platform that brings 83+ specialized agents to your fingertips. Instead of context-switching between ChatGPT, Claude, and custom scripts, you get:
- ⚡ Instant Agent Activation — Type
/backend,/security, or/deployand a domain expert agent takes over - 🧠 Multi-LLM Support — Works with OpenAI GPT-4, Anthropic Claude, local Ollama models, and mock mode for testing
- 🔒 Safety & Security — Path traversal protection, sandboxed execution, secret scanning, and permission-based skill installation
- 🔌 MCP Integration — Native support for Claude Code, Cursor, and Windsurf via the Model Context Protocol
- 📡 Real-Time Streaming — WebSocket API streams agent thoughts and actions live to your IDE
- 🤝 Agent Handoff — Chain multiple agents together for complex workflows (e.g., architect → backend → security auditor)
- 🧪 Built-In Testing — Write declarative agent tests in Markdown, run compliance checks, and validate with CI/CD
- 📦 Publish & Share — Package your custom agents as npm modules for your team or the community
CLI Reference
Aiyu MultiAgent provides a comprehensive command-line interface for managing AI agents, running tasks, testing, and publishing. All commands support --help for detailed usage.
🏠 Core Commands
aiyu-multi-agent init # Quick setup (smart defaults)
aiyu-multi-agent init --interactive # Full interactive setup
aiyu-multi-agent init --dry-run # Preview without writing
aiyu-multi-agent version # Show version + check updates
aiyu-multi-agent status # Project statistics
aiyu-multi-agent list # List all slash commands
aiyu-multi-agent inspect # Observability dashboard
aiyu-multi-agent checklist # Run master checklist⚙️ Execution Engine
Run agents with natural language input or start an interactive chat session:
aiyu-multi-agent run "Create REST API" # Run agent with input
aiyu-multi-agent run "..." --agent backend # Specify agent
aiyu-multi-agent run "..." --provider openai # OpenAI (needs OPENAI_API_KEY)
aiyu-multi-agent run "..." --provider claude # Claude (needs ANTHROPIC_API_KEY)
aiyu-multi-agent run "..." --provider local # Ollama (local LLM)
aiyu-multi-agent run "..." --provider mock # Mock (testing)
aiyu-multi-agent run "..." --json # JSON output (CI/CD)
aiyu-multi-agent run "..." --max-steps 20 # Override max ReAct steps
aiyu-multi-agent run "..." --verbose # Streaming step-by-step
aiyu-multi-agent run "..." --no-cache # Skip cache
aiyu-multi-agent dev # Dev REPL (mock provider)
aiyu-multi-agent dev --provider openai # Dev REPL with real LLM
aiyu-multi-agent dev --verbose # Dev REPL with step logging
aiyu-multi-agent chat # Interactive session
aiyu-multi-agent chat --agent backend # Chat with specific agent🌐 HTTP API & WebSocket
Start a production-ready HTTP server with REST API and WebSocket streaming:
aiyu-multi-agent serve # Start HTTP API server| Endpoint | Method | Description |
|----------|--------|-------------|
| /health | GET | System status (k8s probe exempt from auth) |
| /jobs | POST | Enqueue agent run |
| /jobs/:id | GET | Poll job status |
| /metrics | GET | Prometheus gauge format |
| /traces | GET | Distributed trace data |
| /handoff | POST | Chain agents with context bundles |
| /agents/intervene | POST | Inject mid-run feedback |
| /ws | WebSocket | Real-time agent step streaming |
🔌 MCP Server
Integrate with Claude Code, Cursor, Windsurf, and any MCP-compatible host:
aiyu-multi-agent mcp # Start MCP server (stdio)Claude Code — claude_desktop_config.json:
{
"mcpServers": {
"aiyu": {
"command": "npx",
"args": ["-y", "aiyu-multi-agent", "mcp"],
"cwd": "/path/to/your/project"
}
}
}Cursor — .cursor/mcp.json:
{
"mcpServers": {
"aiyu": {
"command": "npx",
"args": ["-y", "aiyu-multi-agent", "mcp"]
}
}
}| MCP Tool | Description |
|----------|-------------|
| list_agents | Discover available agents |
| run_agent | Execute agent (pass agent_name + input) |
| inspect_agent | Get agent details — skills, tools, instructions |
🧪 Testing & Publishing
aiyu-multi-agent test # Run agent test suite
aiyu-multi-agent test --compliance # Spec compliance (15 checks)
aiyu-multi-agent test --unit # Unit tests (29 tests)
aiyu-multi-agent test --watch # Watch mode
aiyu-multi-agent publish # Publish agent to npm
aiyu-multi-agent publish --dry-run # Validate without publishing📦 Plugin System
aiyu-multi-agent add skill <name> # Install skill from npm
aiyu-multi-agent remove skill <name> # Uninstall skill🔧 LLM Providers
Aiyu MultiAgent supports multiple large language model providers with automatic failover. If one provider's circuit breaker opens, the system automatically tries the next provider in the chain.
| 🧠 Provider | 🔑 Environment Variable | 📝 Supported Models | 💡 Best For |
|:---:|:---:|:---|:---|
| OpenAI | OPENAI_API_KEY | gpt-4, gpt-4o, gpt-3.5-turbo | General-purpose coding, reasoning, creative tasks |
| Claude | ANTHROPIC_API_KEY | claude-3-5-sonnet, claude-3-5-haiku | Long context, detailed analysis, safety-critical code |
| Ollama | OLLAMA_HOST | llama3, mistral, codellama | Local/offline execution, privacy-sensitive projects |
| Mock | AIYU_ENABLE_MOCK=1 | Canned responses | Testing, CI/CD pipelines, development without API keys |
🔄 Failover Chain: openai → claude → ollama → mock
When the circuit breaker detects failures (timeouts, 5xx errors, rate limits), it automatically promotes the next provider. No manual intervention required.
🛠️ Built-in Tools
Every agent gets access to a set of sandboxed, namespaced tools for safe file system and shell operations. All tools run with path traversal protection and argument validation.
| 🔧 Tool | 📥 Required Args | 📝 What It Does |
|:---:|:---:|:---|
| fs.read | path | Read file contents with project-root restriction |
| fs.write | path, content | Atomic file write (temp → rename) with EXDEV fallback |
| fs.edit | path, old_string, new_string | Find & replace with unique match enforcement |
| fs.glob | pattern | Find files by glob pattern (brace {a,b} expansion supported) |
| search.grep | pattern | Search file contents (async walk, Node.js native — works on Windows) |
| shell.exec | command | Execute whitelisted shell commands via execFileSync (no shell) |
| fetch.url | url | Fetch HTTP(S) URLs with 15s timeout, 3-redirect follow, 100KB limit |
Legacy aliases:
Read,Write,Edit,Grep,Glob,Bashauto-map to namespaced versions for backward compatibility.
- Parser Fallback Chain — 4 strategies: structured JSON →
TOOL_CALLregex → JSON code blocks → final answer - Arg Validation — Required args checked before execution; missing args return descriptive errors
- Step Logging — Every step recorded as
{ step, thought, action, result, error, duration_ms } - Output Contract —
outputFormat: jsonenforces valid JSON output (useful for CI/CD) - Deterministic Mode —
temperature: 0for reproducible test results across all providers - Tool Timeout — 30s per tool call via
Promise.race; tracing tagstool_timeoutvstool_failure - LLM Retry/Backoff — Exponential backoff (max 3 retries) for HTTP 429, 503, and network timeouts
- Cross-Platform —
fs.globandsearch.grepuse pure Node.js (no externalgrep/finddependency) - Safe Write EXDEV — Atomic write handles cross-partition rename with copy+unlink fallback
- Agent Name Validation — Rejects path traversal characters:
/ \ : * ? " < > |
📁 Project Structure
.agent/ # Universal config (primary)
├── agents/ # AI Agents
├── skills/
│ ├── core/ # Built-in skills
│ └── installed/ # npm-installed skills
├── workflows/ # Slash command workflows
├── rules/ # Auto-triggered rules
├── tests/ # Agent test files (*.test.md)
├── scripts/ # Verification scripts
└── config.yaml # Agent configuration
.windsurf/ # Symlink → .agent/ (Windsurf IDE)aiyu-multi-agent/
├── bin/
│ ├── cli.js # CLI entry (Commander.js)
│ ├── server.js # HTTP API server entry
│ └── postinstall.js # Post-install script
├── lib/
│ ├── api/
│ │ ├── server.js # Express HTTP server
│ │ ├── ws.js # WebSocket real-time streaming
│ │ ├── handoff.js # Agent handoff + intervention API
│ │ ├── jobs.js # Async job queue
│ │ ├── middleware.js # Auth, rate-limit, logging, shutdown guard
│ │ └── config.js # API configuration
│ ├── core/
│ │ ├── agent-runtime.js # Re-export (V2.6 decomposed)
│ │ ├── react-loop.js # ReAct loop + tool calling + timeout
│ │ ├── chat-session.js # Interactive chat + timeout
│ │ ├── failover.js # Per-provider circuit breaker + failover
│ │ ├── cache.js # LRU cache
│ │ ├── agent-loader.js # Agent spec + skill loading
│ │ ├── prompt-builder.js # System prompt construction
│ │ ├── input-sanitizer.js # Input validation + injection detection
│ │ ├── tool-parser.js # Tool call parsing
│ │ ├── tool-registry.js # Re-export (V2.6 decomposed)
│ │ ├── tool-definitions.js # Tools + schemas + registry
│ │ ├── search-tools.js # Grep + Glob
│ │ ├── command-parser.js # Shell arg parse + ReDoS-safe
│ │ ├── llm-providers.js # OpenAI, Claude, Ollama, Mock + retry
│ │ ├── circuit-breaker.js # Prevents cascade LLM failures
│ │ ├── request-queue.js # Concurrency control + backpressure
│ │ ├── tracing.js # Distributed tracing (OTel export)
│ │ ├── health-check.js # System + Ollama health status
│ │ ├── config.js # Config loader (.agent/ + .windsurf/)
│ │ ├── plugin.js # Plugin lifecycle + permission system
│ │ ├── guardrails.js # Security & safety layer
│ │ ├── usage.js # Usage stats + Prometheus metrics
│ │ ├── logger.js # Structured JSON logging
│ │ └── types.d.ts # TypeScript declarations
│ ├── commands/ # CLI command handlers
│ ├── test/ # Test runner + compliance + unit tests
│ ├── mcp/ # MCP server + tools
│ └── publish/ # Packager + validator + registry
├── templates/ # Agent + skill scaffolds
├── docs/ # Architecture, runtime spec, roadmap, usage
├── .windsurf/ # 84 Agents, 46 Skills, 78 Workflows, 10 Rules
└── aiyu-multi-agent-dashboard/ # Real-time monitoring dashboard (Next.js 14)🎯 How to Use Aiyu MultiAgent
Method 1: Slash Commands — Instant Agent Activation
Type / followed by a command name to instantly activate a specialized AI agent. Each agent has domain-specific skills, tools, and guardrails tailored to its purpose.
| 🏠 Core | 💻 Development | 🏗️ Frameworks |
|:---:|:---:|:---:|
| /create /plan /enhance /brainstorm | /backend /frontend /fullstack | /nextjs /react /angular /sveltekit |
| /status /debug /deploy /test | /database /data-layer /business-logic | /nestjs /express /python-api /go |
| 🔒 Security | ☁️ Infrastructure | 🏭 Industrial |
|:---:|:---:|:---:|
| /security /secure-coding /threat-modeling | /cloud /docker /linux /windows | /mechatronic /pneumatic /electric |
| /pentest-plan /kali /hack /bypass | /network /load-balancer /migrate | /chief-machine /plc /iot |
| 🤝 Orchestration | 🎓 Specialist |
|:---:|:---:|
| /orchestrate /junior-orchestrate (2-3 agents) | /math /elite-tech-leader /package-finder |
| /senior-orchestrate (4-6 agents) | /staff /platform /ux-research /accessibility |
| /elite-orchestrate (7+ agents) | |
Method 2: Natural Language — Smart Auto-Routing
Just describe your task in plain English — the built-in intelligent routing system automatically selects the best AI agent for your request:
"Build me a REST API with JWT authentication and PostgreSQL"
→ 🤖 Active Agent: backend-specialist
"Check my React app for XSS and CSRF vulnerabilities"
→ 🤖 Active Agent: security-auditor
"Design a cloud architecture on AWS for 10k concurrent users"
→ 🤖 Active Agent: cloud-architectMethod 3: Multi-Agent Orchestration — Complex Projects
For complex, multi-domain projects, orchestrate multiple agents to work together:
| Orchestration Level | Agents | Best For |
|:---:|:---:|:---|
| 🟢 Junior /junior-orchestrate | 2–3 | Simple feature, quick bug fix, single-file refactor |
| 🟡 Senior /senior-orchestrate | 4–6 | Multi-service feature, cross-team integration, architecture review |
| 🔴 Elite /elite-orchestrate | 7+ | Mission-critical migration, enterprise platform, zero-downtime deployment |
🛡️ Security & Guardrails
Aiyu MultiAgent is built with security-first design for safe AI agent execution in production environments. Every tool call passes through multiple safety layers:
| 🔐 Guardrail | 🛡️ Protection Layer | 📝 Details |
|:---:|:---|:---|
| Path Traversal | File system isolation | Blocks ../, absolute paths, symlink escapes. Uses projectRoot + path.normalize() + fs.realpathSync() |
| Safe Write | Data integrity | Atomic writes (temp → rename) with EXDEV cross-partition fallback |
| Rate Limit | DoS prevention | In-memory limiting with X-Forwarded-For support, auto-cleanup |
| Sandbox Exec | Command isolation | execFileSync only (no shell). Whitelist-only commands. path.basename() pre-check |
| Command Injection | Input sanitization | Blocks $(), `, rm -rf, mkfs, dd, destructive patterns |
| API Key Auth | Access control | AIYU_API_KEY env var. Bearer token with crypto.timingSafeEqual (timing-attack safe) |
| Env Leak Prevention | Secret protection | Strips API_KEY / TOKEN / SECRET / PASSWORD from child process env regardless of env source |
| Secret Scanning | Pre-publish safety | Detects leaked keys on publish. Blocks with --strict. Recursive scan of all .md, .yaml, .json files for ghp_, sk-, AKIA |
| Permission System | Explicit consent | Skills declare permissions: { fs, network, exec }. User must approve on install |
🧪 Testing Your Agents
Write declarative tests in Markdown — no code required. Create .agent/tests/your-agent.test.md:
---
name: your-agent-test
description: "Test suite for your-agent"
---
## Test 1: Agent loads correctly
- assert: config exists
- assert: agent name is "your-agent"
- assert: provider is "openai"
## Test 2: Guardrails active
- assert: path traversal protection enabled
- assert: safe write enabled
- assert: rate limit enabled
## Test 3: Skills loaded
- assert: skill clean-code loadedRun tests with a single command:
aiyu-multi-agent test # Run all test suites
aiyu-multi-agent test --compliance # 15 spec compliance checks
aiyu-multi-agent test --unit # 29 core module unit tests
aiyu-multi-agent test --watch # Auto-re-run on file changes| Assertion | What It Checks |
|-----------|----------------|
| config exists | .agent/ directory exists and is valid |
| agent name is "X" | Agent manifest name matches expected |
| provider is "X" | LLM provider configured correctly |
| guardrails active/enabled | All security guardrails initialized |
| tool X available | Required tool is in agent's tool list |
| skill X loaded | Skill directory exists and parses correctly |
🔌 Plugin System — Extend With Skills
Install community skills from npm to extend your agents:
aiyu-multi-agent add skill postgres # Install aiyu-multi-agent-skill-postgres
aiyu-multi-agent add skill @org/custom # Scoped packages supportedSkills add new capabilities — database helpers, cloud APIs, testing frameworks, and more. Each skill declares its required permissions, and you approve before installation. npm install uses --ignore-scripts for safety.
- Create npm package
aiyu-multi-agent-skill-<name>:
aiyu-multi-agent-skill-my-skill/
├── SKILL.md # Required: metadata + guidelines
├── config.json # Optional: plugin manifest with permissions
├── scripts/ # Optional: tool functions
└── references/ # Optional: templates, docs- Publish:
npm publish - Users install:
aiyu-multi-agent add skill my-skill
🔧 Customize and Extend
Create .agent/agents/your-agent.md:
---
name: your-agent
description: What this AI agent specializes in
tools: fs.read, search.grep, fs.glob, shell.exec, fs.edit, fs.write
skills: clean-code, architecture
provider: openai
guardrails: true
---
# Your Agent Instructions
Write detailed instructions here for the LLM...aiyu-multi-agent add skill your-skillOr create manually in .agent/skills/your-skill/SKILL.md.
Create .agent/rules/your-rules.md:
---
trigger: on_request
keywords: [keyword1, keyword2]
---
# Your Rule Title
Guidelines that auto-trigger when keywords match...🤝 Contributing
We welcome contributions from the community! Whether it's bug fixes, new agents, skills, or documentation improvements.
| 📄 Document | 🔍 Description | |:---|:---| | CONTRIBUTING.md | Development setup, code style, testing guide | | SECURITY.md | Vulnerability reporting and security policy | | CODE_OF_CONDUCT.md | Community standards (Contributor Covenant 2.1) | | CHANGELOG.md | Full version history and release notes | | CODEBASE.md | Architecture overview and module documentation |
Quick contribution workflow:
# 1. Fork and clone
git clone https://github.com/YOUR_NAME/aiyu-multi-agent.git
# 2. Create a feature branch
git checkout -b feature/my-awesome-feature
# 3. Make changes and test
npm test
# 4. Commit with conventional commits
git commit -m "feat(agents): add kubernetes-orchestrator agent"
# 5. Push and open a Pull Request
git push origin feature/my-awesome-featureMIT License © 2026 Aiyu MultiAgent Contributors
