pi-ollama
v0.1.5
Published
Native Ollama provider for pi coding agent — fixes tool calling under streaming
Maintainers
Readme
pi-ollama
Native Ollama provider extension for the pi coding agent.
Talks directly to Ollama's /api/chat endpoint, bypassing the OpenAI-compat shim at /v1/chat/completions that silently drops tool_calls from streamed responses.
Why this exists
I was simply trying to use the pi agent locally with ollama and this can of worms opened up. Seemed like a good learning opportunity and a great way to start trying to get more involved in the community. New to contributing to open source and build-in-public etiquette, so feedback genuinely welcome. I hope you find this useful!
Pi ships with an openai-completions adapter that routes Ollama traffic through Ollama's OpenAI-compat shim. The shim has a known streaming bug: tool_calls are dropped from the streamed deltas. Without those tool calls, pi's agent loop stalls on the first tool use — the model produces a tool call, the wire eats it, pi never sees it.
Ollama's native /api/chat endpoint doesn't have this problem. This extension talks to /api/chat directly — routing around the shim, not patching it — so tool calls survive streaming and the agent loop completes through tool use, multi-turn workflows, and reasoning-heavy prompts.
Other Ollama extensions for pi exist (linked in Related projects below) and they're solid for chat-style use. The architectural difference here is the API path: through the shim vs. around it. If you're using local Ollama specifically for the agentic tool-call workflows pi is designed around, that distinction is the whole point of this extension.
Install
pi install npm:pi-ollamaOr for local development:
git clone https://github.com/CaptCanadaMan/pi-ollama
cd pi-ollama
npm install
pi install /absolute/path/to/pi-ollamaRequires Ollama running locally (default http://localhost:11434) and at least one tool-capable model pulled.
Uninstall
pi uninstall npm:pi-ollamaThis removes the on-disk package and the entry from ~/.pi/agent/settings.json. Pi won't auto-restore it on the next launch.
The bare form pi uninstall pi-ollama doesn't work — pi parses bare names as relative local paths rather than npm packages, so the npm: prefix is required for any npm-installed extension.
If you've already manually deleted the package directory (find it with npm root -g), pi will silently reinstall it on the next launch because npm:pi-ollama is still in ~/.pi/agent/settings.json. Run the uninstall command above to clear the settings entry — the disk side is already clean.
Optional cleanup of the model discovery cache:
rm -f ~/.pi/agent/cache/pi-ollama-models.jsonQuick start
After installation, launch pi and run:
/ollama-statusYou should see something like:
Ollama base URL: http://localhost:11434
✓ Ollama reachable — 3 model(s) registered
qwen2.5-coder:7b ctx:131,072 [tools]
gemma4:26b ctx:262,144 [tools, vision, reasoning]
llama3.1:8b ctx:131,072 [tools]Switch to one of the discovered models and use pi normally — tool calls work end-to-end.
Slash commands
| Command | Description |
|---|---|
| /ollama-status | Show the Ollama base URL, registered models with capability flags, and currently loaded models. |
| /ollama-refresh | Re-discover models from /api/tags + /api/show and re-register the provider. Useful after ollama pull <model>. |
| /ollama-info [model-id] | Show capability details for a model. Omit the argument to pick from a list of currently registered models. |
| /ollama-context | Set the context length (num_ctx) pi-ollama sends to /api/chat. Picker with common presets + custom input. Persists across pi launches. |
Environment variables
| Variable | Default | Purpose |
|---|---|---|
| OLLAMA_HOST | localhost:11434 | Ollama server host[:port]. May include or omit protocol. |
| OLLAMA_CONTEXT_LENGTH | unset | Override the num_ctx pi-ollama sends to /api/chat. Matches the env var Ollama itself respects, so a single setting works across tools. Superseded by /ollama-context if used. |
| OLLAMA_NATIVE_DEBUG | unset | Set to 1 to enable per-chunk debug logging. Writes to a file (see below) — not stderr, since stderr writes corrupt pi's TUI rendering. |
| OLLAMA_NATIVE_DEBUG_LOG | ~/.pi/agent/cache/pi-ollama-debug.log | Override the default debug log path. |
| OLLAMA_NATIVE_DUMP_DIR | unset | If set, writes paired req-*.json / res-*.ndjson files per request — exact replay artifacts for diagnostics. |
| OLLAMA_NATIVE_GHOST_RETRIES | 2 | Max retries when Ollama returns ghost-token responses (see Reliability below). |
Context length and memory. By default pi-ollama caps num_ctx at 32,768 tokens, even when the model's discovered context window is much larger (some models report 262,144 or more). Without the cap, Ollama would try to allocate enough memory for the full trained context, which exceeds typical hardware budgets. Users on machines with headroom for more can raise the cap via the OLLAMA_CONTEXT_LENGTH env var or /ollama-context slash command. The slash command persists across restarts; the env var is read at startup.
Live-tail the debug log from another terminal:
tail -f ~/.pi/agent/cache/pi-ollama-debug.logHow model discovery works
On extension load, the provider:
- Reads cached models from
~/.pi/agent/cache/pi-ollama-models.json(instant startup, no network). - Calls
GET /api/tagsto list pulled models. - For each model, calls
POST /api/showto extract:- Context window from
model_info.*.context_length. - Tool support from
capabilitiesarray, falling back to family-name heuristics for older Ollama versions. - Vision support from
capabilitiesordetails.familiescontainingclip. - Reasoning/thinking support from
capabilitiesor model-name patterns (r1,deepseek,gemma4, etc.).
- Context window from
- Caches the result for next startup.
If Ollama is unreachable at startup, the cached list is used as a fallback. Run /ollama-refresh once it's available to re-discover.
Reliability features
Ollama's streaming has a few known edge cases. The provider handles them explicitly rather than letting them surface as silent stalls:
Ghost-token retry. Ollama occasionally generates output tokens but streams nothing visible (done:true, eval_count > 0, empty message). The provider reads the first NDJSON line of each attempt, detects this pattern, cancels the connection, and retries. Up to OLLAMA_NATIVE_GHOST_RETRIES times (default 2 → ≈99% success at typical failure rates).
Truncation detection. If the connection closes before any chunk with done:true arrives, the provider surfaces a clear error rather than silently treating the partial response as complete. The error explains this is an Ollama-side reliability issue and prompts a retry.
Empty-response detection. If the connection closes without sending any chunks at all, the provider raises a distinct error pointing at the most likely causes (model failed to load, Ollama crashed, network issue).
Post-stream ghost check. Belt-and-suspenders: if eval_count > 0 but no content, thinking, or tool calls landed in the parsed stream, the provider raises an error rather than reporting a successful empty turn.
Compatibility
- pi: Tested against
@earendil-works/pi-coding-agentv0.75.x. Should work with any version exposing the standardExtensionAPI(registerProviderwithstreamSimple,registerCommandwithctx.ui.notify). - Ollama: Requires Ollama with
/api/chatsupport (most versions)./api/psis used opportunistically and tolerates older versions that don't expose it. - Node: Requires Node 22.19+ (matches pi-coding-agent 0.75.0's minimum).
Architecture (one paragraph)
The extension registers an ollama provider with a custom streamSimple handler. Pi calls streamSimple(model, context, options) for every turn; the handler converts pi's internal message format to Ollama's /api/chat wire format, opens an NDJSON stream, parses chunks into pi's AssistantMessageEventStream events (text deltas, thinking deltas, tool-call bursts, done), and surfaces errors with explanatory messages. No core pi changes required — streamSimple fully replaces the built-in handler for the registered API string.
See src/ for the implementation. Each file has a header comment explaining its role.
Limitations / not yet implemented
- Ollama Cloud (
https://ollama.com). This extension targets local Ollama. Cloud requires different auth (OLLAMA_API_KEY) and a different base URL — seefgrehm/pi-ollama-cloudif you want cloud-only. - Per-model
temperature/top_pdefaults. Sampling parameters are passed through from pi's options when set, but there's no extension-level config for default values per model. Open an issue if you need this. - Auto-pull. If you select a model that isn't pulled, you'll get an error from Ollama. The extension doesn't offer to
ollama pullit for you.
Related projects
- pi-mono — the pi coding agent itself
- ollama#12557 — the upstream tool-calling streaming bug this extension routes around
- pi-mono#3357 — the open issue requesting an official local-LLM extension
@0xkobold/pi-ollama— alternative extension covering local + cloud via the OpenAI-compat shimfgrehm/pi-ollama-cloud— cloud-only Ollama extension
License
MIT © 2026 CaptCanadaMan
