npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

qwen-agent-server

v0.11.10

Published

Stateful MCP supervisor exposing Qwen Code as a multi-backend agent — see docs/rdr/RDR-001

Downloads

1,481

Readme

qwen-agent-server

Stateful MCP supervisor that exposes a local Qwen Code inference stack as a set of MCP tools. Stateful chat surface — qwen_spawn, qwen_poll, qwen_send, qwen_stop, qwen_sessions — manages long-lived @qwen-code/sdk sessions per task. Stateless single-turn — qwen_oneshot, qwen_oneshot_vision — for operator-dispatch shapes. Non-chat — qwen_embed, qwen_rerank, qwen_tokenize — POST direct to backend endpoints (bypass the SDK pipeline which is text-only). Lifecycle / introspection — qwen_backends, qwen_extensions, qwen_reload_extensions. See the top-level README's MCP tools table for full per-tool detail.

The server is intentionally minimal: it is a thin supervisor layer, not a framework. All inference happens inside Qwen Code via the SDK; the server's job is session lifecycle, backend routing, and the canUseTool permission gate. See docs/rdr/RDR-001 for the full architecture rationale.


Quick start

Step 1 — start the inference backend

./scripts/start-stack.sh

This launches llama-server on localhost:8080 running qwen3.6-27b-instruct. The health endpoint at http://localhost:8080/health must return 200 before the server can route traffic.

Step 2 — build and install

./scripts/setup-qwen-agent-server.sh

Idempotent. Runs npm install + npm run build, creates the Qwen home directory (~/.qwen-agent-server-home by default), and prints the registration command.

Step 3 — register with Claude Code

Copy and run the registration command printed by the setup script:

claude mcp add --scope user qwen-agent-server \
  "node /path/to/repo/mcp-bridges/qwen-agent-server/dist/server.js"

After registration, qwen_spawn, qwen_poll, qwen_send, qwen_stop, and qwen_backends appear in Claude Code's MCP tool list.


Configuration

All configuration is via environment variables passed to the server process. The setup script and registration command can be prefixed with these.

| Variable | Default | Description | |---|---|---| | QWEN_BACKENDS | [{"id":"local","url":"http://localhost:8080/v1","model":"qwen3.6-27b-instruct","tier":"local","capacity":"heavy"}] | JSON array of Backend objects (see src/types.ts). Each entry requires id, url, model, tier ("local" or "remote"), capacity ("fast" or "heavy"). Optional: weight (default 1). | | QWEN_SUPERVISOR_MAX_SESSIONS | 3 | Maximum concurrent active sessions. qwen_spawn returns an error if the cap is reached. | | QWEN_SUPERVISOR_IDLE_TTL_MS | 1800000 | Milliseconds before an idle session (no qwen_poll activity) is evicted. Default = 30 minutes. | | ROUTER_HEAVY_THRESHOLD_TOKENS | 2000 | Estimated token count above which the router prefers a capacity:heavy backend. | | ROUTER_HEAVY_KEYWORDS | prove,derive,architect,design | Comma-separated prompt keywords that trigger routing to a capacity:heavy backend regardless of token count. |

Example with a remote Strix Halo box (Tailscale-reachable) joined to the local Mac backend:

QWEN_BACKENDS='[
  {"id":"local-mac","url":"http://localhost:8080/v1","model":"qwen3.6-27b-instruct","tier":"local","capacity":"fast"},
  {"id":"strix","url":"http://your-strix-host:1234/v1","model":"qwen3.6-35b-a3b","tier":"remote","capacity":"heavy"}
]' \
  claude mcp add --scope user qwen-agent-server \
  "node /path/to/repo/mcp-bridges/qwen-agent-server/dist/server.js"

The router prefers capacity:heavy for prompts over ROUTER_HEAVY_THRESHOLD_TOKENS or containing ROUTER_HEAVY_KEYWORDS, falling back to capacity:fast. The model field must match what /v1/models returns from each backend (for llama-server it's the --alias value; for LM Studio it's the loaded GGUF's identifier).


Extensions

Per-spawn Qwen Code extension loadout (RDR-002). The orchestrator chooses which extensions are active for each session via qwen_spawn's opts.extensions field. The SDK doesn't expose extensions in QueryOptions directly — the supervisor bridges by setting pathToQwenExecutable to a wrapper script (scripts/qwen-extensions-wrapper.sh) that reads QWEN_AGENT_EXTENSIONS from env and prepends --extensions <list> to the CLI's argv.

Startup resolution. The supervisor resolves the real qwen binary once at startup. QWEN_REAL_BIN (env override, verified to exist and be executable) takes precedence; otherwise which qwen is consulted. Either miss is a fail-fast non-zero exit — an operator who hasn't installed Qwen Code can't recover at first spawn, only by fixing the install.

Per-spawn semantics. opts.extensions accepts three optional sub-fields:

| Field | Effect | |---|---| | only: ['a','b'] | Exact-set semantics. enable and disable are ignored in this branch. Empty only: [] disables all extensions for the spawn (--extensions none). | | enable: ['c'] | Additively unions onto the session-default base. | | disable: ['a'] | Subtractively removes from the session-default base after enable. disable wins on overlap. |

The session-default base is QWEN_DEFAULT_EXTENSIONS (a comma-list) when set, otherwise the CLI's defaults (all enabled per extension-enablement.json) — in which case the wrapper drops the --extensions flag and the CLI inherits its own behaviour. Because the supervisor cannot enumerate the implicit set, enable/disable without either QWEN_DEFAULT_EXTENSIONS or only is rejected with a spawn_error envelope rather than silently producing the wrong set.

Example — pin a session to one extension:

// qwen_spawn input
{
  "task": "Refactor the auth module",
  "opts": { "extensions": { "only": ["serena"] } }
}

Names match config.name from each extension's qwen-extension.json, case-insensitive. Resolved unknown names produce a { error: { code: "spawn_error", message: "unknown extension(s): X" } } envelope and no session is instantiated.

Cache + reload. The supervisor caches the installed-extension name list at startup by parsing qwen extensions list output. Drain semantics apply: in-flight sessions retain whatever set was resolved at their spawn time; cache reloads only affect future spawns. Operators who install or uninstall extensions while the supervisor is running can pick up the change via the qwen_reload_extensions MCP tool. (Pre-v0.3 this was gated behind QWEN_ADMIN_TOOLS=1; the gate was removed when the slash- command surface took over operator-facing privileged ops — the tool is now registered unconditionally whenever an extensions cache is wired, which is always in main().) See RDR-002 §Resolution-algorithm and §Installed-extensions cache for the full design.

| Variable | Default | Description | |---|---|---| | QWEN_REAL_BIN | (resolved via which qwen) | Override for the real Qwen Code binary path. Verified at startup. | | QWEN_DEFAULT_EXTENSIONS | unset (CLI defaults apply) | Comma-list of extension names that the supervisor uses as the session-default base when opts.extensions.only is unset. |


SDK pin policy

@qwen-code/sdk is pinned exact to 0.1.7 in package.json. This is intentional and must not be bumped without running the integration test suite against a live backend.

Why exact? RDR-001 §Q1 documents that the deny-with-message path ({ behavior: 'deny', message: '<answer>' } in canUseTool) is the proven mechanism by which ask_user_question answers are delivered back to the model. This is empirically verified (see /tmp/qwen-sdk-probe/probe.mjs, Spike B, 2026-05-04) but is not part of the SDK's public API contract. A patch or minor release could silently change it.

Similarly, KV-cache affinity depends on the SDK preserving context across turns within one query() call. The session layer pins session.backend at construction and never reassigns it (§Q3 KV-cache affinity) — but an SDK change to connection management could break cache locality invisibly.

Gate before bumping:

# 1. Ensure llama-server is running
curl -sf http://localhost:8080/health

# 2. Run the integration suite
cd mcp-bridges/qwen-agent-server
npm run test:integration

If any of the three SDK pin assertions fail, do not bump the SDK. File a report against RDR-001 and investigate whether the fallback paths documented there cover the regression before proceeding.

The three pin tests are in tests/integration/sdk-behavior.test.ts.


Development

cd mcp-bridges/qwen-agent-server

# Unit tests (no backend required)
npm test

# Integration tests (requires llama-server on :8080)
npm run test:integration

# Build
npm run build

# Run directly (after build)
node dist/server.js