npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

pi-qwen-mode-proxy

v1.0.1

Published

Sampling mode proxy for Qwen models served via llama.cpp — switch between thinking, coding, and instruct modes

Downloads

280

Readme

pi-qwen-mode-proxy

Sampling mode proxy extension for pi that intercepts OpenAI-completions API requests to a llama.cpp server and injects mode-specific sampling parameters for Qwen models (tested with Qwen 3.6 27B). Parameters taken from the recommendation note on model page (https://huggingface.co/Qwen/Qwen3.6-27B).

Modes

| Parameter | 🧠 Thinking | 💻 Coding | 📝 Instruct | |-----------|:-----------:|:---------:|:-----------:| | temperature | 1.0 | 0.6 | 0.7 | | top_p | 0.95 | 0.95 | 0.80 | | top_k | 20 | 20 | 20 | | min_p | 0.0 | 0.0 | 0.0 | | presence_penalty | 0.0 | 0.0 | 1.5 | | repetition_penalty | 1.0 | 1.0 | 1.0 |

  • Thinking — Creative, exploratory tasks. High temperature for diverse output.
  • Coding — Precise, deterministic coding tasks. Lower temperature for consistent results.
  • Instruct — Instruction-following with presence penalty to encourage topic variety.

Installation

npm

pi install npm:pi-qwen-mode-proxy

git

pi install git:github.com/YOUR_USERNAME/pi-qwen-mode-proxy

local

pi install /path/to/pi-qwen-mode-proxy

Usage

Requires a llama.cpp server serving a Qwen model, registered as the llamacpp provider in ~/.pi/agent/models.json:

{
  "providers": {
    "llamacpp": {
      "baseUrl": "http://10.10.10.11:8080/v1",
      "api": "openai-completions",
      "apiKey": "llamacpp",
      "compat": {
        "supportsDeveloperRole": true,
        "supportsReasoningEffort": true,
        "thinkingFormat": "qwen-chat-template"
      },
      "models": [
        {
          "id": "llamacpp",
          "name": "(Local AI)",
          "reasoning": true,
          "input": ["text"],
          "contextWindow": 131072,
          "maxTokens": 16384,
          "cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 }
        }
      ]
    }
  }
}

Commands

/mode              Show current mode and parameters
/mode thinking     Switch to thinking mode
/mode coding       Switch to coding mode
/mode instruct     Switch to instruct mode

The current mode is displayed in the status bar footer and persists across /reload via session storage.

How It Works

The extension hooks into pi's before_provider_request event, which fires after pi builds the OpenAI chat completions payload but before it's sent over the network. When the target model is llamacpp, the handler injects the six sampling parameters (temperature, top_p, top_k, min_p, presence_penalty, repetition_penalty) corresponding to the active mode.

No custom provider or streaming implementation is needed — the extension works as a lightweight interceptor on top of pi's built-in openai-completions provider.

Configuration

The model ID filter defaults to llamacpp. If your provider/model uses a different ID, edit extensions/index.ts and change the TARGET_MODEL constant.

License

MIT