@zhixuan92/multi-model-agent-core

v3.12.1

Published

2 hours ago

Core library for multi-model-agent: provider runners (Claude, Codex, OpenAI-compatible), routing logic, config schema, and tool/sandbox primitives.

0High
0Medium
0Low

gumiokane

mcp llm claude codex openai agent routing multi-model

@zhixuan92/multi-model-agent-core

Runtime library for multi-model-agent. Import it to run multi-provider agent tasks directly from your own Node program — same routing, supervision, and review pipeline, without the HTTP server.

Want the standalone service instead? Install @zhixuan92/multi-model-agent — it wraps this library in a local HTTP daemon with client-installable skills for Claude Code, Gemini CLI, Codex CLI, and Cursor.

Install

npm install @zhixuan92/multi-model-agent-core

Requires Node >= 22. ESM only.

Quick example

import { loadConfigFromFile } from '@zhixuan92/multi-model-agent-core/config/load';
import { runTasks } from '@zhixuan92/multi-model-agent-core/run-tasks';

// Uses the same ~/.multi-model/config.json as the standalone daemon —
// agents.standard, agents.complex, defaults.parentModel, etc.
const config = await loadConfigFromFile();

const results = await runTasks([
  { prompt: 'Refactor auth.ts to use JWT.',         agentType: 'complex' },
  { prompt: 'Write unit tests for auth module.',    agentType: 'standard' },
], config);

for (const r of results) {
  console.log(r.status, r.usage.costUSD, r.usage.costDeltaVsParentUSD, r.output);
}

costDeltaVsParentUSD is populated when defaults.parentModel is set in the config — it's actualCost − parentCost (negative = worker cheaper/savings). Use it to surface a $X saved (Y× ROI) figure in your own UI.

What's inside

Provider runners — Claude, Codex, and any OpenAI-compatible endpoint
Routing engine — capability filter → agent type → cheapest qualifier
runTasks — parallel dispatch, returns per-task results with usage, cost, files touched, status, and escalation log
Reviewed lifecycle — spec review + quality review by a different agent, auto-commit of file changes, file artifact verification
Executors — pure execute<Tool>(ctx, input) functions for delegate, audit, review, verify, debug, execute-plan, retry, investigate, explore (used by the HTTP server package)
Tool schemas — Zod-validated input shapes for each tool, exportable via ./tool-schemas/*
BatchRegistry — server-wide state machine for pending / awaiting_clarification / complete / failed / expired batches with context-block refcount pinning
Sandboxed tools — readFile, writeFile, grep, glob, listFiles, runShell with cwd-only confinement

Subpath exports

| Subpath | What | |---|---| | ./config/schema | parseConfig, multiModelConfigSchema, serverConfigSchema | | ./config/load | loadConfigFromFile, loadAuthToken | | ./routing/resolve-agent | resolveAgent — resolves agent type to provider | | ./routing/model-profiles | Model cost/tier profiles | | ./provider | createProvider factory | | ./run-tasks | runTasks parallel dispatcher, RunTasksOptions | | ./heartbeat | HeartbeatTimer — periodic progress heartbeat emitter | | ./types | All shared types | | ./executors | Pure execute<Tool>(ctx, input) functions and ExecutionContext type | | ./tool-schemas | Zod input/output schemas for each tool | | ./intake/pipeline | runIntakePipeline — compile → infer → classify → resolve | | ./intake/types | DraftTask, Source, IntakeResult, ClarificationEntry | | ./intake/classify | classifyDraft — deterministic classification heuristic | | ./intake/confirm | processConfirmations — clarification resume processing | | ./intake/clarification-store | ClarificationStore — TTL/LRU state for clarification sets | | ./intake/compilers/* | Route compilers: delegate, review, debug, verify, audit, execute-plan, investigate, explore | | ./reporting/parse-investigation-report | parseInvestigationReport, parseCitations, parseConfidence (3.4.0) | | ./auto-commit | autoCommitFiles — git commit helper for worker file changes | | ./file-artifact-check | partitionFilePaths, checkOutputTargets — output target verification | | ./telemetry/types | TelemetryEvent, UploadBatch, InstallMetadata Zod schemas + SCHEMA_VERSION | | ./telemetry/event-builder | buildTaskCompletedEvent, buildSessionStartedEvent, etc. — pure event constructors | | ./telemetry/consent-rules | decideConsent — env / config / default precedence resolver |

Diagnostic logging

Diagnostic logging and verbose streaming are both OFF by default.

{
  "diagnostics": {
    "log": false,
    "verbose": false,
    "logDir": "/some/path"
  }
}

Two independent axes:

diagnostics.log — when true, append JSONL records to mmagent-YYYY-MM-DD.jsonl under diagnostics.logDir (defaults to ~/.multi-model/logs/).
diagnostics.verbose — when true, the server emits per-tool-call, per-LLM-turn, per-stage-transition, and per-batch-lifecycle events. If log is also true, they're persisted; otherwise they stream only to the server's stderr.

CLI equivalents:

mmagent serve --verbose   # stream events to stderr (no file written)
mmagent serve --log       # persist to JSONL only (no stderr noise)
mmagent serve --verbose --log   # both
mmagent logs --follow --batch=<id>   # tail + filter

As of 3.4.0 every task-execution event the worker emits to the verbose stderr stream is also written to the JSONL log via a single emit(TaskEvent) writer — schema parity across both sinks. Crash/disconnect events (startup, request_start, request_complete, shutdown, error) are written unconditionally; per-task events (heartbeat, stage_change, tool_call, turn_complete, etc.) flow through the same writer.

What's new

Latest: 3.12.1 — Telemetry & state-machine correctness pass diagnosed against 3.11.1 production data. Reviewer-implementer separation gate now uses tier (standard / complex / main), not model name. Cross-runner cached/input subset semantics (R6) — Claude runner constructs inputTokens = turnInputTokens + cache_read + cache_creation directly; cost calc consumes cachedTokens so per-stage cost reflects the cache discount. Reviewer prompts split into {systemPrefix, userBody} for cross-runner caching. Deferred-finalizer ensures spec_review/quality_review stage entries persist on every early-exit path (round_cap, cost_ceiling, time_ceiling, all-tiers-unavailable). diffReviewStatus plumbed through RunResult so diff_review.verdict reflects the lifecycle decision. terminationReason.cause = 'error' on diff-review reject + transport_failure so R1 stops firing. validation_warnings attached to events for backend storage. verifyCommand flows through intake. Adds tier/implementerTier to wire schema and R16 (rework requires preceding review). 99 new tests (2786 total). Full history: CHANGELOG.

Full documentation

→ github.com/zhixuan312/multi-model-agent

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@zhixuan92/multi-model-agent-core

Install

Quick example

What's inside

Subpath exports

Diagnostic logging

What's new

Full documentation

License