npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

pi-ollama

v0.1.5

Published

Native Ollama provider for pi coding agent — fixes tool calling under streaming

Readme

pi-ollama

Native Ollama provider extension for the pi coding agent.

Talks directly to Ollama's /api/chat endpoint, bypassing the OpenAI-compat shim at /v1/chat/completions that silently drops tool_calls from streamed responses.


Why this exists

I was simply trying to use the pi agent locally with ollama and this can of worms opened up. Seemed like a good learning opportunity and a great way to start trying to get more involved in the community. New to contributing to open source and build-in-public etiquette, so feedback genuinely welcome. I hope you find this useful!

Pi ships with an openai-completions adapter that routes Ollama traffic through Ollama's OpenAI-compat shim. The shim has a known streaming bug: tool_calls are dropped from the streamed deltas. Without those tool calls, pi's agent loop stalls on the first tool use — the model produces a tool call, the wire eats it, pi never sees it.

Ollama's native /api/chat endpoint doesn't have this problem. This extension talks to /api/chat directly — routing around the shim, not patching it — so tool calls survive streaming and the agent loop completes through tool use, multi-turn workflows, and reasoning-heavy prompts.

Other Ollama extensions for pi exist (linked in Related projects below) and they're solid for chat-style use. The architectural difference here is the API path: through the shim vs. around it. If you're using local Ollama specifically for the agentic tool-call workflows pi is designed around, that distinction is the whole point of this extension.


Install

pi install npm:pi-ollama

Or for local development:

git clone https://github.com/CaptCanadaMan/pi-ollama
cd pi-ollama
npm install
pi install /absolute/path/to/pi-ollama

Requires Ollama running locally (default http://localhost:11434) and at least one tool-capable model pulled.


Uninstall

pi uninstall npm:pi-ollama

This removes the on-disk package and the entry from ~/.pi/agent/settings.json. Pi won't auto-restore it on the next launch.

The bare form pi uninstall pi-ollama doesn't work — pi parses bare names as relative local paths rather than npm packages, so the npm: prefix is required for any npm-installed extension.

If you've already manually deleted the package directory (find it with npm root -g), pi will silently reinstall it on the next launch because npm:pi-ollama is still in ~/.pi/agent/settings.json. Run the uninstall command above to clear the settings entry — the disk side is already clean.

Optional cleanup of the model discovery cache:

rm -f ~/.pi/agent/cache/pi-ollama-models.json

Quick start

After installation, launch pi and run:

/ollama-status

You should see something like:

Ollama base URL: http://localhost:11434
✓ Ollama reachable — 3 model(s) registered
  qwen2.5-coder:7b               ctx:131,072  [tools]
  gemma4:26b                     ctx:262,144  [tools, vision, reasoning]
  llama3.1:8b                    ctx:131,072  [tools]

Switch to one of the discovered models and use pi normally — tool calls work end-to-end.


Slash commands

| Command | Description | |---|---| | /ollama-status | Show the Ollama base URL, registered models with capability flags, and currently loaded models. | | /ollama-refresh | Re-discover models from /api/tags + /api/show and re-register the provider. Useful after ollama pull <model>. | | /ollama-info [model-id] | Show capability details for a model. Omit the argument to pick from a list of currently registered models. | | /ollama-context | Set the context length (num_ctx) pi-ollama sends to /api/chat. Picker with common presets + custom input. Persists across pi launches. |


Environment variables

| Variable | Default | Purpose | |---|---|---| | OLLAMA_HOST | localhost:11434 | Ollama server host[:port]. May include or omit protocol. | | OLLAMA_CONTEXT_LENGTH | unset | Override the num_ctx pi-ollama sends to /api/chat. Matches the env var Ollama itself respects, so a single setting works across tools. Superseded by /ollama-context if used. | | OLLAMA_NATIVE_DEBUG | unset | Set to 1 to enable per-chunk debug logging. Writes to a file (see below) — not stderr, since stderr writes corrupt pi's TUI rendering. | | OLLAMA_NATIVE_DEBUG_LOG | ~/.pi/agent/cache/pi-ollama-debug.log | Override the default debug log path. | | OLLAMA_NATIVE_DUMP_DIR | unset | If set, writes paired req-*.json / res-*.ndjson files per request — exact replay artifacts for diagnostics. | | OLLAMA_NATIVE_GHOST_RETRIES | 2 | Max retries when Ollama returns ghost-token responses (see Reliability below). |

Context length and memory. By default pi-ollama caps num_ctx at 32,768 tokens, even when the model's discovered context window is much larger (some models report 262,144 or more). Without the cap, Ollama would try to allocate enough memory for the full trained context, which exceeds typical hardware budgets. Users on machines with headroom for more can raise the cap via the OLLAMA_CONTEXT_LENGTH env var or /ollama-context slash command. The slash command persists across restarts; the env var is read at startup.

Live-tail the debug log from another terminal:

tail -f ~/.pi/agent/cache/pi-ollama-debug.log

How model discovery works

On extension load, the provider:

  1. Reads cached models from ~/.pi/agent/cache/pi-ollama-models.json (instant startup, no network).
  2. Calls GET /api/tags to list pulled models.
  3. For each model, calls POST /api/show to extract:
    • Context window from model_info.*.context_length.
    • Tool support from capabilities array, falling back to family-name heuristics for older Ollama versions.
    • Vision support from capabilities or details.families containing clip.
    • Reasoning/thinking support from capabilities or model-name patterns (r1, deepseek, gemma4, etc.).
  4. Caches the result for next startup.

If Ollama is unreachable at startup, the cached list is used as a fallback. Run /ollama-refresh once it's available to re-discover.


Reliability features

Ollama's streaming has a few known edge cases. The provider handles them explicitly rather than letting them surface as silent stalls:

Ghost-token retry. Ollama occasionally generates output tokens but streams nothing visible (done:true, eval_count > 0, empty message). The provider reads the first NDJSON line of each attempt, detects this pattern, cancels the connection, and retries. Up to OLLAMA_NATIVE_GHOST_RETRIES times (default 2 → ≈99% success at typical failure rates).

Truncation detection. If the connection closes before any chunk with done:true arrives, the provider surfaces a clear error rather than silently treating the partial response as complete. The error explains this is an Ollama-side reliability issue and prompts a retry.

Empty-response detection. If the connection closes without sending any chunks at all, the provider raises a distinct error pointing at the most likely causes (model failed to load, Ollama crashed, network issue).

Post-stream ghost check. Belt-and-suspenders: if eval_count > 0 but no content, thinking, or tool calls landed in the parsed stream, the provider raises an error rather than reporting a successful empty turn.


Compatibility

  • pi: Tested against @earendil-works/pi-coding-agent v0.75.x. Should work with any version exposing the standard ExtensionAPI (registerProvider with streamSimple, registerCommand with ctx.ui.notify).
  • Ollama: Requires Ollama with /api/chat support (most versions). /api/ps is used opportunistically and tolerates older versions that don't expose it.
  • Node: Requires Node 22.19+ (matches pi-coding-agent 0.75.0's minimum).

Architecture (one paragraph)

The extension registers an ollama provider with a custom streamSimple handler. Pi calls streamSimple(model, context, options) for every turn; the handler converts pi's internal message format to Ollama's /api/chat wire format, opens an NDJSON stream, parses chunks into pi's AssistantMessageEventStream events (text deltas, thinking deltas, tool-call bursts, done), and surfaces errors with explanatory messages. No core pi changes required — streamSimple fully replaces the built-in handler for the registered API string.

See src/ for the implementation. Each file has a header comment explaining its role.


Limitations / not yet implemented

  • Ollama Cloud (https://ollama.com). This extension targets local Ollama. Cloud requires different auth (OLLAMA_API_KEY) and a different base URL — see fgrehm/pi-ollama-cloud if you want cloud-only.
  • Per-model temperature / top_p defaults. Sampling parameters are passed through from pi's options when set, but there's no extension-level config for default values per model. Open an issue if you need this.
  • Auto-pull. If you select a model that isn't pulled, you'll get an error from Ollama. The extension doesn't offer to ollama pull it for you.

Related projects


License

MIT © 2026 CaptCanadaMan