npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

localpi

v0.3.0

Published

Pi-compatible local model launcher with managed llama-server support.

Readme

localpi

Localpi is a local Pi launcher for open-weight models.

By default, Localpi discovers available local providers, lets you choose when more than one model is loaded, points Pi at the selected model, and writes Pi config for the other discovered models so /model can switch among them during the session.

Localpi supports LM Studio, vLLM, custom OpenAI-compatible servers, and an optional managed llama-server fallback.

Localpi is intentionally generic. It does not contain classifier prompts, dataset workflows, GitHub routing logic, or final-schema output machinery. Structured classifier runs belong in caller tools such as localpager-agent.

See:

Install

npm install -g localpi

During development:

npm run localpi -- --status

After build:

node dist/src/cli/main.js --status

Runtime Model

Target default:

localpi --model gemma-12b

This uses the default auto runtime. If exactly one model is loaded locally, Localpi selects it. If multiple models are loaded in an interactive terminal, Localpi boots Pi with a temporary default and opens Pi's native model selector. If no external model is loaded and llama-server is installed, Localpi can fall back to the managed llama-server default. Thinking starts from --thinking, LOCALPI_THINKING, the last saved Pi thinking level, or medium.

LM Studio is explicit:

localpi --runtime lmstudio --model gemma-4-e4b-it

vLLM is explicit:

localpi --runtime vllm --model qwen

Custom OpenAI-compatible endpoints are also supported:

localpi --runtime openai-compatible --base-url http://127.0.0.1:8000/v1 --model my-model

Use --provider <id> with --model <id> to select a catalog entry without opening the picker. --provider <id> by itself only scopes the available choices. Localpi avoids loading multiple heavyweight local runtimes at the same time. When using the managed llama-server runtime, it either stops its previous managed server or clearly reports what is already running before starting another model.

Default Pi Behavior

Localpi launches Pi with:

  • default tools: read,bash,edit,write,grep,find,ls
  • a system prompt that explains local tool approval and local-model limits
  • an approval gate before every tool call
  • token speed and token count status while responses stream
  • bounded Gemma/llama-server reasoning controlled by --thinking
  • an in-session /thinking command for changing Pi's active thinking level
  • local state under ~/.local/state/localpi

The approval gate makes failed or denied tool calls explicit to the model so the model does not claim that a blocked command ran.

LM Studio Alternative

LM Studio exposes an OpenAI-compatible endpoint, usually:

http://127.0.0.1:1234/v1

Load Gemma in LM Studio:

~/.lmstudio/bin/lms server start
~/.lmstudio/bin/lms load gemma-4-e4b-it -y

Then run localpi against LM Studio explicitly:

localpi --runtime lmstudio --model gemma-4-e4b-it

Usage

Run Pi interactively on the default local model:

localpi

Run a non-interactive Pi prompt:

localpi -p "summarize this repo"

Run an endless TUI demo:

localpi --demo --model gemma-e4b

Demo mode requires an explicit model, opens the normal Pi TUI, and keeps one live Pi session so followup prompts continue from the first prompt while Pi owns streaming, tok/s status, slash commands, and exit behavior.

Override the demo prompts:

localpi --demo --model gemma-e4b --demo-initial-prompt-file ./prompts/story.txt --demo-followup-prompt "Continue. Try to write as long as possible."

Pin a model alias:

localpi --model gemma-e4b -p "write a detailed implementation plan"

Use a bounded reasoning budget with managed llama-server:

localpi --model gemma-12b --thinking low -p "classify this item"

In an interactive session, use /thinking to pick a level or /thinking high to set one directly. This changes Pi's active thinking level for later turns and saves it for the next localpi launch. For managed llama-server, the server-side reasoning budget is still chosen at startup because changing it requires restarting the local server process.

For managed llama-server, thinking levels map to server-side reasoning:

| Level | llama-server reasoning | | --------- | ---------------------------------------- | | off | --reasoning off | | minimal | --reasoning on --reasoning-budget 32 | | low | --reasoning on --reasoning-budget 128 | | medium | --reasoning on --reasoning-budget 512 | | high | --reasoning on --reasoning-budget 2048 | | xhigh | --reasoning on --reasoning-budget 8192 |

The fallback default is medium.

Point at vLLM:

localpi --runtime vllm --model qwen -p "review the src directory"

Point at a different OpenAI-compatible local server:

localpi --runtime openai-compatible --base-url http://127.0.0.1:8000/v1 -p "review the src directory"

Pass a Pi flag that localpi also owns after --:

localpi --model gemma-e4b -- --model some-pi-level-value

Stop the managed llama-server runtime:

localpi --stop

Options

  • --runtime <auto|llama-server|lmstudio|vllm|openai-compatible>: runtime backend. Default: auto
  • --provider <id>: catalog provider id to use, for example lmstudio or vllm
  • --model <alias|id|path|auto>: model alias, model id, or GGUF path
  • --ctx <n> / --context-window <n>: model context window
  • --max-tokens <n>: generated model max output tokens
  • --base-url <url>: OpenAI-compatible endpoint for LM Studio or custom endpoints
  • --server-command <path>: llama-server executable path
  • --llama-server <path>: alias for --server-command
  • --host <host>: managed llama-server host. Default: 127.0.0.1
  • --port <n>: managed llama-server port. Default: 18194
  • --gpu-layers <n>: managed llama-server GPU layers. Default: 999
  • --parallel <n>: managed llama-server parallel slots. Default: 1
  • --chat-template <path>: optional llama.cpp chat template file
  • --state-dir <path>: runtime state directory. Default: ~/.local/state/localpi
  • --session-dir <path>: Pi session directory. Default: <state-dir>/sessions
  • --pi-command <command>: Pi launch command
  • --providers-file <path>: provider registry JSON
  • --model-profile <path>: local model capability profile JSON
  • --model-reasoning <bool>: override generated Pi reasoning capability
  • --model-thinking-format <deepseek|qwen-chat-template>: override generated Pi thinking format
  • --tools <list>: Pi tools allow list. Default: read,bash,edit,write,grep,find,ls
  • --thinking <off|minimal|low|medium|high|xhigh>: Pi thinking level and managed llama-server reasoning budget. Default: last saved level, then medium
  • --demo: endlessly run Pi prompts inside the normal Pi TUI until interrupted or Pi exits; requires an explicit non-auto model
  • --demo-initial-prompt <text>: first demo prompt
  • --demo-followup-prompt <text>: repeated demo prompt after the first run
  • --demo-initial-prompt-file <path>: UTF-8 file for the first demo prompt
  • --demo-followup-prompt-file <path>: UTF-8 file for repeated demo prompts
  • --no-approval: disable the tool approval gate
  • --no-token-status: disable the token status extension
  • --status: print runtime, model, and Pi config status
  • --stop: stop the managed llama-server process
  • --list: list configured model aliases

Environment

  • LOCALPI_RUNTIME
  • LOCALPI_MODEL
  • LOCALPI_PROVIDER
  • LOCALPI_BASE_URL
  • LOCALPI_PROVIDERS_FILE
  • LOCALPI_MODEL_PROFILE
  • LOCALPI_MODEL_REASONING
  • LOCALPI_MODEL_THINKING_FORMAT
  • LOCALPI_STATE_DIR
  • LOCALPI_SESSION_DIR
  • LOCALPI_PI_CMD
  • LOCALPI_CONTEXT_WINDOW
  • LOCALPI_MAX_TOKENS
  • LOCALPI_LLAMA_SERVER
  • LOCALPI_HOST
  • LOCALPI_PORT
  • LOCALPI_GPU_LAYERS
  • LOCALPI_PARALLEL
  • LOCALPI_CHAT_TEMPLATE
  • LOCALPI_TOOLS
  • LOCALPI_THINKING
  • LOCALPI_DEMO
  • LOCALPI_DEMO_INITIAL_PROMPT
  • LOCALPI_DEMO_FOLLOWUP_PROMPT
  • LOCALPI_DEMO_INITIAL_PROMPT_FILE
  • LOCALPI_DEMO_FOLLOWUP_PROMPT_FILE
  • LOCALPI_MODELS_FILE
  • LOCALPAGER_AGENT_PROFILE
  • LOCALPAGER_AGENT_REASONING
  • LOCALPAGER_AGENT_THINKING_FORMAT

LOCALPI_MODELS_FILE may point at a JSON file with this shape:

{
  "models": {
    "my-model": {
      "id": "my-model-id",
      "path": "/path/to/model.gguf",
      "contextWindow": 32768,
      "chatTemplate": "/path/to/template.jinja"
    }
  }
}

Provider registries use the same file or LOCALPI_PROVIDERS_FILE:

{
  "providers": {
    "vllm-qwen": {
      "type": "openai-compatible",
      "name": "vLLM Qwen",
      "baseUrl": "http://127.0.0.1:8000/v1",
      "discover": true
    }
  }
}

Use discover: false for endpoints that should not be probed during startup. They can still be selected explicitly with --provider vllm-qwen --model <id>.

Model capability profiles can fill in metadata that OpenAI-compatible servers do not expose through /v1/models, such as vLLM reasoning support:

{
  "id": "gemma4-26b-a4b-nvfp4",
  "model": "nvidia/Gemma-4-26B-A4B-NVFP4",
  "base_url": "http://127.0.0.1:8000/v1",
  "client": {
    "context_window": 32768,
    "max_tokens": 4096
  },
  "capabilities": {
    "reasoning": true,
    "thinking_format": "qwen-chat-template"
  }
}

LOCALPAGER_AGENT_PROFILE, LOCALPAGER_AGENT_REASONING, and LOCALPAGER_AGENT_THINKING_FORMAT are accepted as aliases so LocalPager Agent can pass the same profile metadata through to localpi.

Development

npm run format
npm run lint
npm run typecheck
npm test
npm run build
npm run check