pi-llama-server

v1.1.0

Published

13 days ago

Pi extension for llama-server router - model discovery, auto-load, per-project config

0High
0Medium
0Low

Note: This is how I use pi.dev + llama.cpp on my local machine. I created a plugin so that I can update my setup quickly.

pi-llama-server

Pi extension that integrates a running llama-server instance with the Pi Coding Agent. Discovers llama-server models and automatically loads the selected model when you switch models in Pi.

Demo

Prerequisites

A running llama-server instance (from llama.cpp) in router-mode (the default if you don't mention -m)
Pi Coding Agent installed (@earendil-works/pi-coding-agent)

Install

pi install npm:pi-llama-server

Or from git:

pi install git:github.com/user/pi-llama-server

Pi auto-discovers the extension via pi.extensions in package.json. No additional setup needed.

Configuration

The llama-server URL is resolved in this order:

Per-project config — create .pi/llama-server.json in your project root:
```
{ "url": "http://10.0.0.5:9090" }
```

Environment variable — set globally:

export LLAMA_SERVER_URL=http://10.0.0.5:9090

Default — falls back to http://127.0.0.1:8080

Usage

Use Ctrl+P (or /model) in Pi to select any llama-server model for inference. Pi switches to that model, and the extension automatically tells llama-server to load it. While llama-server reports loading progress, Pi shows a progress bar in the footer status.

How it works

When Pi starts, the extension:

Resolves the llama-server URL from config/env/default
Queries GET /models to discover available GGUF models
Registers each model as an OpenAI-compatible provider under {url}/v1
Listens for model switch events and calls POST /models/load on the server
Listens to GET /models/sse while a selected model is loading to show footer progress

llama-server endpoints used

| Endpoint | Method | Purpose | |----------|--------|---------| | /models | GET | List all models | | /models/load | POST | Load a model | | /models/sse | GET | Stream model status/progress events | | /v1/... | POST | OpenAI-compatible completions (via Pi provider) |

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

pi-llama-server

Demo

Prerequisites

Install

Configuration

Usage

How it works

llama-server endpoints used