@zhixuan92/multi-model-agent-core
v3.12.1
Published
Core library for multi-model-agent: provider runners (Claude, Codex, OpenAI-compatible), routing logic, config schema, and tool/sandbox primitives.
Maintainers
Readme
@zhixuan92/multi-model-agent-core
Runtime library for multi-model-agent. Import it to run multi-provider agent tasks directly from your own Node program — same routing, supervision, and review pipeline, without the HTTP server.
Want the standalone service instead? Install
@zhixuan92/multi-model-agent— it wraps this library in a local HTTP daemon with client-installable skills for Claude Code, Gemini CLI, Codex CLI, and Cursor.
Install
npm install @zhixuan92/multi-model-agent-coreRequires Node >= 22. ESM only.
Quick example
import { loadConfigFromFile } from '@zhixuan92/multi-model-agent-core/config/load';
import { runTasks } from '@zhixuan92/multi-model-agent-core/run-tasks';
// Uses the same ~/.multi-model/config.json as the standalone daemon —
// agents.standard, agents.complex, defaults.parentModel, etc.
const config = await loadConfigFromFile();
const results = await runTasks([
{ prompt: 'Refactor auth.ts to use JWT.', agentType: 'complex' },
{ prompt: 'Write unit tests for auth module.', agentType: 'standard' },
], config);
for (const r of results) {
console.log(r.status, r.usage.costUSD, r.usage.costDeltaVsParentUSD, r.output);
}costDeltaVsParentUSD is populated when defaults.parentModel is set in the config — it's actualCost − parentCost (negative = worker cheaper/savings). Use it to surface a $X saved (Y× ROI) figure in your own UI.
What's inside
- Provider runners — Claude, Codex, and any OpenAI-compatible endpoint
- Routing engine — capability filter → agent type → cheapest qualifier
runTasks— parallel dispatch, returns per-task results with usage, cost, files touched, status, and escalation log- Reviewed lifecycle — spec review + quality review by a different agent, auto-commit of file changes, file artifact verification
- Executors — pure
execute<Tool>(ctx, input)functions for delegate, audit, review, verify, debug, execute-plan, retry, investigate, explore (used by the HTTP server package) - Tool schemas — Zod-validated input shapes for each tool, exportable via
./tool-schemas/* - BatchRegistry — server-wide state machine for pending / awaiting_clarification / complete / failed / expired batches with context-block refcount pinning
- Sandboxed tools —
readFile,writeFile,grep,glob,listFiles,runShellwithcwd-onlyconfinement
Subpath exports
| Subpath | What |
|---|---|
| ./config/schema | parseConfig, multiModelConfigSchema, serverConfigSchema |
| ./config/load | loadConfigFromFile, loadAuthToken |
| ./routing/resolve-agent | resolveAgent — resolves agent type to provider |
| ./routing/model-profiles | Model cost/tier profiles |
| ./provider | createProvider factory |
| ./run-tasks | runTasks parallel dispatcher, RunTasksOptions |
| ./heartbeat | HeartbeatTimer — periodic progress heartbeat emitter |
| ./types | All shared types |
| ./executors | Pure execute<Tool>(ctx, input) functions and ExecutionContext type |
| ./tool-schemas | Zod input/output schemas for each tool |
| ./intake/pipeline | runIntakePipeline — compile → infer → classify → resolve |
| ./intake/types | DraftTask, Source, IntakeResult, ClarificationEntry |
| ./intake/classify | classifyDraft — deterministic classification heuristic |
| ./intake/confirm | processConfirmations — clarification resume processing |
| ./intake/clarification-store | ClarificationStore — TTL/LRU state for clarification sets |
| ./intake/compilers/* | Route compilers: delegate, review, debug, verify, audit, execute-plan, investigate, explore |
| ./reporting/parse-investigation-report | parseInvestigationReport, parseCitations, parseConfidence (3.4.0) |
| ./auto-commit | autoCommitFiles — git commit helper for worker file changes |
| ./file-artifact-check | partitionFilePaths, checkOutputTargets — output target verification |
| ./telemetry/types | TelemetryEvent, UploadBatch, InstallMetadata Zod schemas + SCHEMA_VERSION |
| ./telemetry/event-builder | buildTaskCompletedEvent, buildSessionStartedEvent, etc. — pure event constructors |
| ./telemetry/consent-rules | decideConsent — env / config / default precedence resolver |
Diagnostic logging
Diagnostic logging and verbose streaming are both OFF by default.
{
"diagnostics": {
"log": false,
"verbose": false,
"logDir": "/some/path"
}
}Two independent axes:
diagnostics.log— whentrue, append JSONL records tommagent-YYYY-MM-DD.jsonlunderdiagnostics.logDir(defaults to~/.multi-model/logs/).diagnostics.verbose— whentrue, the server emits per-tool-call, per-LLM-turn, per-stage-transition, and per-batch-lifecycle events. Iflogis also true, they're persisted; otherwise they stream only to the server's stderr.
CLI equivalents:
mmagent serve --verbose # stream events to stderr (no file written)
mmagent serve --log # persist to JSONL only (no stderr noise)
mmagent serve --verbose --log # both
mmagent logs --follow --batch=<id> # tail + filterAs of 3.4.0 every task-execution event the worker emits to the verbose stderr stream is also written to the JSONL log via a single emit(TaskEvent) writer — schema parity across both sinks. Crash/disconnect events (startup, request_start, request_complete, shutdown, error) are written unconditionally; per-task events (heartbeat, stage_change, tool_call, turn_complete, etc.) flow through the same writer.
What's new
Latest: 3.12.1 — Telemetry & state-machine correctness pass diagnosed against 3.11.1 production data. Reviewer-implementer separation gate now uses tier (standard / complex / main), not model name. Cross-runner cached/input subset semantics (R6) — Claude runner constructs inputTokens = turnInputTokens + cache_read + cache_creation directly; cost calc consumes cachedTokens so per-stage cost reflects the cache discount. Reviewer prompts split into {systemPrefix, userBody} for cross-runner caching. Deferred-finalizer ensures spec_review/quality_review stage entries persist on every early-exit path (round_cap, cost_ceiling, time_ceiling, all-tiers-unavailable). diffReviewStatus plumbed through RunResult so diff_review.verdict reflects the lifecycle decision. terminationReason.cause = 'error' on diff-review reject + transport_failure so R1 stops firing. validation_warnings attached to events for backend storage. verifyCommand flows through intake. Adds tier/implementerTier to wire schema and R16 (rework requires preceding review). 99 new tests (2786 total). Full history: CHANGELOG.
Full documentation
→ github.com/zhixuan312/multi-model-agent
License
MIT — Copyright (c) 2026 Zhang Zhixuan
