@qvac/opencode-plugin

v0.1.0

Published

9 days ago

OpenCode plugin that runs a local, managed QVAC serve so `opencode` works against on-device models with no second terminal

0High
0Medium
0Low

mafintosh

prdn

qvac tether opencode plugin local-ai on-device offline coding-agent llm

@qvac/opencode-plugin

Run OpenCode against a local, on-device QVAC model with no second terminal and no manual server. Add the plugin to a project's opencode.json and opencode brings up a managed qvac serve by itself, points OpenCode at it, and tears it down on exit.

{
  "$schema": "https://opencode.ai/config.json",
  "plugin": ["@qvac/opencode-plugin"]
}

opencode          # interactive — uses qvac/qwen3.5-9b by default
opencode run "…"  # one-shot — works too (no startup race)

That's it: no provider block, no second terminal, no QVAC_MODEL= prefix.

How it works

On startup the plugin spawns a host child process in a real node/bun runtime. (OpenCode runs plugins inside its own compiled binary, whose process.execPath is the editor — not a JS runtime — so managed mode can't spawn its detached supervisor from there. The host gives it a real runtime, and means the serve is reaped even if OpenCode is killed hard.)
The host starts a small local proxy and immediately reports it is listening — before the model downloads. The plugin injects an OpenAI-compatible qvac provider pointed at the proxy and returns, so opencode run never trips OpenCode's startup timeout. The model loads in the background; the first turn waits on it (a slow cold turn, not a failure).
The host runs createQvac({ mode: 'managed' }) from @qvac/ai-sdk-provider, which brings up a shared, idle-reaped serve on an auto-allocated port.

Multiple OpenCode windows share one serve (the provider's reuse default): the detached runner owns the loaded model and reaps it a few minutes after the last session leaves, so a second window doesn't reload the model.

Model ids

You pick a friendly, models.dev-style id (qwen3.5-9b) and that exact id flows through the whole stack — OpenCode's model picker (qvac/qwen3.5-9b) and the request model field. The verbose QVAC constant (QWEN3_5_9B_MULTIMODAL_Q4_K_M) stays an internal detail of the serve; the friendly-id → constant mapping lives in @qvac/ai-sdk-provider's qvacCatalog, so every AI-SDK tool resolves the same ids.

| models.dev id | QVAC constant | | -------------- | --------------------------------- | | qwen3.5-0.8b | QWEN3_5_0_8B_MULTIMODAL_Q4_K_M | | qwen3.5-2b | QWEN3_5_2B_MULTIMODAL_Q4_K_M | | qwen3.5-4b | QWEN3_5_4B_MULTIMODAL_Q4_K_M | | qwen3.5-9b | QWEN3_5_9B_MULTIMODAL_Q4_K_M |

Passing a raw constant also works (it normalizes back to the friendly id for display).

Options

Set from any of these sources (lowest to highest precedence): built-in defaults, a qvac.json in the project dir, the opencode.json plugin-tuple options, and QVAC_* environment variables.

| Option (qvac.json / plugin tuple) | Env | Default | Meaning | | ----------------------------------- | ------------------------- | ------------ | ------- | | model | QVAC_MODEL | qwen3.5-9b | friendly id or a raw QVAC constant | | ctxSize | QVAC_CTX_SIZE | 32768 | serve context window (an agent's prompt + tool schemas need ≥ 32768) | | reasoningBudget | QVAC_REASONING_BUDGET | -1 | -1 = reasoning on, 0 = off | | tools | QVAC_TOOLS | true | enable the tool-calling chat template | | shim | QVAC_SHIM | true | apply the OpenAI-compat transforms (see below) | | runtime | QVAC_RUNTIME | auto | path to the node/bun runtime that hosts the serve | | readyTimeoutMs | QVAC_READY_TIMEOUT_MS | 1800000 | budget for the serve to become healthy, incl. a cold model download | | setDefaultModel | QVAC_SET_DEFAULT_MODEL | true | force qvac/<model> as the project default + small model | | debug | QVAC_DEBUG | false | mirror host milestones + per-request traces to stderr |

Via the plugin tuple in opencode.json:

{
  "$schema": "https://opencode.ai/config.json",
  "plugin": [["@qvac/opencode-plugin", { "model": "qwen3.5-2b" }]]
}

Or a qvac.json next to it:

{ "model": "qwen3.5-2b", "ctxSize": 32768 }

The `shim` option

@ai-sdk/openai-compatible (which OpenCode speaks) and QVAC serve disagree on two points today, so the host runs a small in-process proxy that bridges them:

array content — the AI SDK sends content as an array of typed parts; serve currently accepts only a string, so the proxy flattens text parts.
reasoning — with reasoning on, the model emits <think>…</think> inline on the content channel; the proxy re-routes that to reasoning_content so OpenCode shows a collapsed "Thought" block instead of raw tags.

Both are stopgaps for serve gaps. Set shim: false (or QVAC_SHIM=0) to turn the transforms off once serve closes those gaps; the proxy itself stays (it is what lets startup return before the model finishes loading).

Performance expectations

With the 9B model the agent's build prompt (~26k tokens with tool schemas) is re-prefilled each turn on a single local worker, so a tool-using turn is roughly 20–30s. A smaller model (qwen3.5-2b) is snappier but less capable for agentic work. Only one QVAC worker runs machine-wide; if the OpenCode desktop app is running it can hold locks the CLI needs — quit it (or isolate XDG_* dirs) when running opencode from the terminal.

Requirements

@qvac/ai-sdk-provider@^0.2.2 for managed mode.
@qvac/cli@^0.7.0 available so the host can run qvac serve (resolved by the provider's managed mode).

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@qvac/opencode-plugin

How it works

Model ids

Options

The shim option

Performance expectations

Requirements

The `shim` option