npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@pioneur/llama-launch

v0.1.6

Published

Launch Claude Code, Codex, and similar coding agents against local llama.cpp models.

Readme

llama-launch

Ollama-style launch control for coding-agent harnesses on top of local llama.cpp.

llama-launch is for the case where you want to keep the provider UX, but replace the hosted model with a local runtime. It starts the compatible backend, exposes the protocol shim the harness expects, and then launches the harness from your current project directory.

llama claude gemma4:31b
llama codex gemma4:31b

Overview

llama-launch gives you a short CLI for agent runtimes:

llama claude gemma4:31b
llama codex gemma4:31b
llama launch codex ggml-org/gemma-4-31b-it-GGUF:Q4_K_M

It handles:

  • starting llama-server when needed
  • reusing managed llama-server backends across runs
  • exposing a local gateway for Anthropic Messages, OpenAI Responses, and OpenAI Chat
  • launching the selected harness with the right environment variables
  • npm install for use inside any project

Why this exists

Use llama-launch when you want local models to behave more like hosted coding providers.

Examples:

  • run Claude Code against a local gemma4:31b backend
  • run Codex against a local llama.cpp runtime
  • keep the harness inside the current project while the model runtime is managed separately

Install

Global install:

npm install -g @pioneur/llama-launch
llama claude gemma4:31b

Inside a project:

npm install --save-dev @pioneur/llama-launch
npx llama claude gemma4:31b

Local path install while developing:

npm install --save-dev /path/to/llama-launch
npx llama codex gemma4:31b

The npm package bootstraps its own private Python virtualenv during install, so users do not need to manually install the Python package first.

Requirements

  • Node.js 20+
  • npm 10+
  • Python 3
  • llama-server on PATH
  • the target harness installed if you want to launch it directly

Provider binaries:

  • claude for llama claude ...
  • codex for llama codex ...
  • opencode for llama opencode ...
  • openclaw for llama openclaw ...
  • hermes for llama hermes ...
  • pi for llama pi ...

If a provider binary is not on PATH, you can override it with --provider-bin.

Quick start

Dry-run the resolved launch plan:

llama claude gemma4:31b --dry-run --output json

Run Claude against the preferred local profile:

llama claude gemma4:31b \
  --provider-arg=--print \
  --provider-arg=--output-format \
  --provider-arg=json \
  --provider-arg=--dangerously-skip-permissions \
  --provider-arg='Use the Bash tool to run "pwd" and answer with the working directory only.'

By default, llama-launch uses a compact terminal UI and also keeps backend and gateway logs quiet. The default terminal behavior is meant to feel closer to a hosted provider run: tool activity plus the final answer, without the raw transport chatter. To see the unfiltered provider and backend output, use:

llama claude gemma4:31b --ui raw

First-run downloads from Hugging Face can take several minutes for large models. llama-launch now waits up to 30 minutes for backend startup by default. If needed, override that with:

export LLAMA_LAUNCH_BACKEND_STARTUP_TIMEOUT_SECONDS=3600

Resident backend behavior:

  • the first run starts a managed llama-server
  • later runs on the same model and port reuse that backend instead of reloading the model
  • inspect running managed backends with llama ps
  • stop one explicitly with llama stop --base-url http://127.0.0.1:8080

Run Codex:

llama codex gemma4:31b

Run against a local GGUF:

llama codex ./models/gemma4-31b.gguf

Run against a Hugging Face GGUF repo:

llama codex ggml-org/gemma-4-31b-it-GGUF:Q4_K_M

Install by provider

Claude:

npm install -g @pioneur/llama-launch
llama claude gemma4:31b

Codex:

npm install -g @pioneur/llama-launch
llama codex gemma4:31b

Pi:

npm install -g @pioneur/llama-launch
llama pi gemma4:31b

Project-local install:

npm install --save-dev @pioneur/llama-launch
npx llama claude gemma4:31b

Model selection

The model selector is interpreted in three ways:

  • local .gguf path: uses llama-server -m
  • owner/repo[:quant]: uses llama-server -hf
  • runtime model id: assumes a compatible llama.cpp backend is already serving it

Examples:

llama codex ./models/qwen2.5-coder-7b.gguf
llama codex bartowski/Qwen2.5-Coder-7B-GGUF
llama codex ggml-org/gemma-4-31b-it-GGUF:Q4_K_M
llama claude gemma4:31b

The same selection can also be passed with --model:

llama claude --model gemma4:31b
llama codex --model ./models/model.gguf

Choosing a model

Use the selector that matches how you want to source the model:

  • local GGUF file when you already have a model downloaded on disk
  • Hugging Face repo when you want llama-server to resolve the model from a GGUF repository
  • runtime model id when your local backend already exposes a model name you want to target

Known aliases such as gemma4:31b can also auto-resolve to a built-in profile and start the matching backend for you.

Examples by source:

# Local GGUF
llama codex ./models/gemma4-31b.gguf

# Hugging Face GGUF repo
llama codex bartowski/Qwen2.5-Coder-7B-GGUF

# Existing runtime model id
llama claude gemma4:31b

Examples by harness:

# Claude-oriented local profile
llama claude gemma4:31b

# Codex against a local GGUF
llama codex ./models/qwen2.5-coder-7b.gguf

# Codex against a Hugging Face GGUF repo
llama codex ggml-org/gemma-4-31b-it-GGUF:Q4_K_M

Current provider constraint:

  • claude is intentionally stricter and is aimed at stronger local tool-use profiles, with gemma4:31b as the primary supported target
  • codex is more permissive and can be used with a wider range of local llama.cpp models

Known-good starting points:

  • claude: gemma4:31b
  • codex: gemma4:31b or a local coder-focused GGUF
  • opencode: any solid OpenAI-compatible local model exposed through llama.cpp
  • openclaw: experimental, start with stronger instruction-following models
  • hermes: OpenAI-compatible local models are the intended path
  • pi: OpenAI-compatible local models are the intended path

What gets launched

When you run llama claude gemma4:31b, the launcher resolves the model selector, brings up the local backend if needed, exposes the right gateway shape, and then starts the provider CLI with the expected environment variables.

For Claude, that means Anthropic Messages compatibility on top of a local llama.cpp model. For Codex, that means the OpenAI-style path it expects.

Provider support

Current harness support level:

  • claude: supported through the Anthropic Messages adapter, tuned for gemma4:31b
  • codex: supported through the OpenAI Responses adapter
  • opencode: supported through the OpenAI Chat path
  • hermes: supported through the OpenAI Chat path
  • pi: supported through the OpenAI Chat path
  • openclaw: experimental

The launcher ships the protocol shims and process orchestration. Actual runtime quality still depends on the selected model and the provider CLI's tolerance for non-hosted backends.

Security and privacy

What the npm package contains:

  • the CLI wrapper
  • the Python gateway/runtime
  • the README and license

What it does not publish:

  • your local models
  • your ~/.claude directory
  • local API keys or tokens
  • shell history, logs, or project files
  • the private virtualenv created during npm install

At runtime, llama-launch passes environment variables to the selected provider process so the provider can talk to the local gateway. That is local process behavior, not published package content.

Vault-backed secrets

If you want one authenticated service to broker credentials at runtime, this repo includes a Bitwarden Secrets Manager wrapper.

Files:

  • scripts/run-with-vault.sh
  • scripts/publish-with-vault.sh
  • .vault-secrets.example.json

How it works:

  • the machine account authenticates once through BWS_ACCESS_TOKEN
  • run-with-vault.sh looks up a requested env var name in a local secret-id map
  • it fetches the live secret value from Bitwarden with bws
  • it exports that env var only for the target subprocess
  • the target command runs without storing the downstream secret in the repo

Setup:

cp .vault-secrets.example.json .vault-secrets.json
mkdir -p ~/.config/llama-launch
mv .vault-secrets.json ~/.config/llama-launch/vault-secrets.json

Populate ~/.config/llama-launch/vault-secrets.json with real Bitwarden secret IDs:

{
  "NPM_TOKEN": "your-secret-id",
  "ANTHROPIC_API_KEY": "your-secret-id",
  "OPENAI_API_KEY": "your-secret-id"
}

Make the machine account token available:

export BWS_ACCESS_TOKEN='your-machine-account-token'

Examples:

./scripts/run-with-vault.sh NPM_TOKEN -- npm whoami
./scripts/publish-with-vault.sh
./scripts/run-with-vault.sh ANTHROPIC_API_KEY -- claude --print 'hello'

By default the wrapper looks for the secret-id map in:

  • LLAMA_LAUNCH_VAULT_CONFIG
  • ./.vault-secrets.json
  • ~/.config/llama-launch/vault-secrets.json

Claude profile

The intended local Claude profile is:

llama claude gemma4:31b

That profile is tuned around:

  • ggml-org/gemma-4-31b-it-GGUF:Q4_K_M
  • Anthropic Messages compatibility
  • local tool use as the primary target

Smaller Gemma profiles such as gemma3:1b are intentionally rejected for Claude launch because they are not reliable enough for Claude-style tool workflows.

Claude compatibility

The Claude Code path includes:

  • ANTHROPIC_BASE_URL pointed at the gateway root
  • ANTHROPIC_CUSTOM_MODEL_OPTION for local model ids
  • CLAUDE_CODE_DISABLE_EXPERIMENTAL_BETAS=1 for proxy mode
  • /v1/messages
  • /v1/messages/count_tokens
  • streamed tool_use translation over a chat-completions backend
  • --bare by default to avoid unrelated user plugins and hooks interfering with local-model runs

Commands

llama protocols
llama harnesses
llama backend-status
llama server-command --hf-model ggml-org/gemma-4-31b-it-GGUF:Q4_K_M
llama serve
llama launch codex gemma4:31b
llama claude gemma4:31b
llama codex gemma4:31b

If you prefer the long Python form during development:

python -m llama_launch.cli claude gemma4:31b

Testing

Run the automated suite:

.venv/bin/python -m unittest discover -s tests -v

Run the opt-in real Claude probe:

RUN_REAL_CLAUDE_PROBE=1 .venv/bin/python -m unittest \
  tests.test_runtime_e2e.RuntimeE2ETests.test_real_claude_completes_bash_tool_roundtrip_through_gateway -v

Status

Current focus:

  • Claude Code
  • Codex
  • OpenCode
  • OpenClaw
  • Hermes
  • Pi

The transport and launch stack is in place. Real-world harness quality still depends on the underlying model’s tool-use and instruction-following capability.