@wasmagent/model-local

v1.0.3

Published

12 hours ago

Embedded local LLM provider for wasmagent — node-llama-cpp adapter with grammar-constrained tool calling, multi-mirror download (HF/hf-mirror/ModelScope), and cert pipeline

0High
0Medium
0Low

tellerlin

wasmagent wasm agent portable sandbox executor ai llm typescript

/model-local

Embedded local-LLM provider for wasmagent — node-llama-cpp adapter with grammar-constrained tool calling, multi-mirror downloads (HuggingFace / hf-mirror / ModelScope), and a certification harness for picking which models actually work in agent workflows.

The whole agent stack — model, code execution, state — runs on the user's machine. No cloud LLM, no API key, no telemetry.

Install

# Provider (small package, no native deps).
npm install /model-local

# Optional native peer — pre-built binaries for macOS/Linux/Windows + ARM/x64.
npm install node-llama-cpp

The native peer is optional: if you only want the registry/downloader/types (e.g. to ship a server that proxies models), you can skip it. LocalModel.generate() will throw a typed LocalModelDependencyError with an actionable install hint if it's missing.

Quick start

import { LocalModel, localFirst } from "/model-local";
import { AnthropicModel, CodeAgent } from "/core";

// Pick one of three sources:
const local = new LocalModel({ source: { model: "qwen2.5-1.5b" } });        // alias
// or:        new LocalModel({ source: { path: "./my-model.gguf" } });       // user GGUF
// or:        new LocalModel({ source: { url: "https://..." } });            // direct URL

// Use it directly:
const agent = new CodeAgent({ model: local, tools: [] });

// Or compose with a cloud fallback for prod:
const model = localFirst(
  local,
  new AnthropicModel("claude-haiku-4-5-20251001", process.env.ANTHROPIC_API_KEY),
);

Three model sources

| Source | Use when | Verification | |---|---|---| | { model: "alias" } | You want a maintained, vetted model | sha256 (registry-pinned) | | { path: "./x.gguf" } | You have a self-trained or hand-downloaded GGUF | none (your file, your trust) | | { url: "https://..." } | One-off pull from any URL | sha256 only if you supply expectedSha256 |

Mirror selection (大陆友好)

Three resolution layers, high → low precedence:

Programmatic — new LocalModel({ source: { model: "qwen2.5-1.5b" }, mirror: "modelscope" })
Environment — WASMAGENT_MODEL_MIRROR=hf-mirror (or modelscope, or any URL prefix)
Registry default — HuggingFace first, then mirrors

Built-in presets:

huggingface — origin (sha256 anchor)
hf-mirror — hf-mirror.com, community-run, URL-compatible with HF
modelscope — modelscope.cn, ModelScope魔搭国内 CDN

Custom CDN: pass any URL prefix as mirror, and the downloader will append the canonical filename and hit your CDN first, falling back to the registry chain if it fails.

# One-line CLI override:
WASMAGENT_MODEL_MIRROR=modelscope npx wasmagent model pull qwen2.5-1.5b

⚠️ Mirror trust model: every download is sha256-verified against the registry value (which is anchored to the HuggingFace original). Mirrors are transport channels, not trust roots.

Grammar-constrained tool calling

Sub-1B models routinely emit malformed JSON when asked to call tools. LocalModel enables JSON-schema grammar in the sampler by default, so tool_use output is structurally legal 100% of the time. Semantic correctness still depends on the model.

const model = new LocalModel({
  source: { model: "qwen2.5-1.5b" },
  enableGrammar: true,  // default
});

Set enableGrammar: false to compare A/B against free-form sampling — useful for diffing on the cert harness.

CLI

# Browse the registry.
wasmagent model list

# Pull (resumable, sha256-verified, multi-mirror).
wasmagent model pull qwen2.5-1.5b

# Force a mirror.
wasmagent model pull qwen2.5-1.5b --mirror modelscope

# Verify a cached file's sha256.
wasmagent model verify qwen2.5-1.5b

# Free up disk.
wasmagent model rm qwen2.5-1.5b

wasmagent/cli declares /model-local as an optional peer — if you don't install this package, the CLI falls back to a clean error message rather than crashing.

Routing presets

import { localFirst, offlineOnly, devLocalOr } from "/model-local";

// Try local; fall through to cloud on any error.
const a = localFirst(localModel, cloudModel);

// Loud "no cloud, ever" envelope (passthrough today; reserves a hook for
// future enforcement).
const b = offlineOnly(localModel);

// Dev convenience: WASMAGENT_DEV_LOCAL=1 → local; otherwise → cloud.
const c = devLocalOr(localModel, cloudModel);

These are documented combinations of the existing FallbackModel from /core — not a parallel routing mechanism. You get the same retry/fallover semantics as everywhere else in the framework.

Recommended models — current registry

All entries are <1.5 GB at q4_k_m or smaller quantisation. The recommended flag flips on once the cert harness publishes a passing score (see L4). Until then you can still wasmagent model pull <alias> and self-evaluate.

| Alias | Best for | License | Size | |---|---|---|---| | qwen2.5-0.5b | Tool calling on tiny footprint — 3/3 form/picked/semantic on cert (real-machine, 2026-06-12) | Apache-2.0 | ~409 MB (q4_0, sha256 pinned 2026-06-13) | | qwen3-0.6b | English/code, only Q8_0 quant published | Apache-2.0 | ~610 MB (q8_0, sha256 pinned 2026-06-13) | | qwen2.5-1.5b | Chinese + English, 32K context (Stage-0 ≤2GB winner per evomerge GSM8K 70.5%) | Apache-2.0 | ~1.07 GB (q4_k_m, sha256 pinned 2026-06-13) | | gemma-3-1b | English tasks, ggml-org mirror | Gemma ToU | ~769 MB (q4_k_m, sha256 pinned 2026-06-13) | | llama-3.2-1b | English/code, 128K context, lmstudio-community mirror | Llama 3.2 Community | ~770 MB (q4_k_m, sha256 pinned 2026-06-13) |

See docs/reports/local-model-cert-2026-06-12.md in the wasmagent repo for the full real-machine baseline.

Run the cert harness on any of them (or your own GGUF):

node examples/benchmarks/local-model-cert.mjs --model qwen2.5-1.5b --kernel quickjs
node examples/benchmarks/local-model-cert.mjs --path ./my-model.gguf --out report.md

Honest caveats

Sub-1B models are not Claude/GPT-class. Complex tool routing, multi-step reasoning, and long-form synthesis are still cloud-class jobs. The local model is for high-frequency, lower-difficulty work — drafts, intent classification, summarisation, dev/CI runs.
Grammar guarantees form, not semantics. A grammar-clean output can still pick the wrong tool or wrong arguments. The cert harness's form rate and semantic rate are reported separately.
Native binding. node-llama-cpp brings prebuilt binaries but requires Node.js 20+ on a desktop/server platform. Cloudflare Workers cannot run this. Use localFirst with a cloud model if you deploy to edge runtimes.

License

Apache-2.0 — see LICENSE.

Model files have their own licenses; they are downloaded from the publisher's host on demand and never re-distributed by this package. See MODEL_REGISTRY (in src/registry.ts) for the license attribute on each entry.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme