pi-llama-server
v1.0.2
Published
Pi extension for llama-server router — live model listing, load/unload, per-project config
Maintainers
Readme
Note: This is how I use pi.dev + llama.cpp on my local machine. I created a plugin so that I can update my setup quickly.
pi-llama-server
Pi extension that integrates a running llama-server instance with the Pi Coding Agent. Provides live model listing and ability to load/unload via the llama-server API.
Prerequisites
- A running llama-server instance (from llama.cpp) in
router-mode(the default if you don't mention-m) - Pi Coding Agent installed (
@mariozechner/pi-coding-agent)
Install
pi install npm:pi-llama-serverOr from git:
pi install git:github.com/user/pi-llama-serverPi auto-discovers the extension via pi.extensions in package.json. No additional setup needed.
Configuration
The llama-server URL is resolved in this order:
- Per-project config — create
.pi/llama-server.jsonin your project root:{ "url": "http://10.0.0.5:9090" } - Environment variable — set globally:
export LLAMA_SERVER_URL=http://10.0.0.5:9090 - Default — falls back to
http://127.0.0.1:8080
Usage
Browse and manage models
Run the /models slash command inside Pi to see all models on the llama-server with live status:
| Status | Meaning |
|--------|---------|
| 🟢 loaded | Model is loaded and ready |
| 🟡 loading | Model is being loaded |
| 🔴 failed | Model failed to load |
| ⚪ other | Unknown state |
Select a model to load, unload, or switch to it.
Switch models
Use Ctrl+P (or /model) in Pi to select any llama-server model for inference. The extension will automatically tell llama-server to load the chosen model.
How it works
When Pi starts, the extension:
- Resolves the llama-server URL from config/env/default
- Queries
GET /modelsto discover available GGUF models - Registers each model as an OpenAI-compatible provider under
{url}/v1 - Listens for model switch events and calls
POST /models/loadon the server - Provides the
/modelsinteractive command for managing models
llama-server endpoints used
| Endpoint | Method | Purpose |
|----------|--------|---------|
| /models | GET | List all models |
| /models/load | POST | Load a model |
| /models/unload | POST | Unload a model |
| /v1/... | POST | OpenAI-compatible completions (via Pi provider) |
