pi-ollama-api

v1.3.0

Published

16 days ago

Ollama Cloud provider extension for Pi — connect to Ollama Cloud models via the OpenAI-compatible API

Downloads

590

0High
0Medium
0Low

mercuriusdream

pi-package pi-extension ollama ollama-cloud ai llm openai-compatible chat embeddings

pi-ollama-api

Ollama Cloud provider extension for Pi — connect your terminal coding agent to 200+ models on Ollama Cloud via the OpenAI-compatible API.

Features

Native Ollama API discovery — Queries /api/tags and /api/show for real model metadata (context windows, capabilities, parameter sizes, quantization)
Actual context windows — No hardcoded defaults. Every model reports its real context length from Ollama's API (e.g., 1M for DeepSeek V4, 262K for Kimi K2, 128K for GPT-OSS)
Capability detection — Vision, reasoning, tools detected from Ollama's capabilities array
OpenAI-compatible API — Uses openai-completions streaming (works with all Pi features)
Embeddings tool — Generate embeddings via /v1/embeddings for RAG and similarity search
Direct chat tool — Send one-off completions for model comparison or testing

Supported Model Families

| Family | Models | Highlights | |--------|--------|------------| | Llama | 3.3, 3.2, 3.1, 3, 2 | 70B frontier, Vision variants, 405B | | Qwen | 3, 2.5, 2, VL, Coder, Math | 128K context, Vision, Code, Math variants | | DeepSeek | R1, V3, V2, Coder V2 | Reasoning (R1), 671B total | | Mistral | Codestral, Mistral, Nemo, Large, Mixtral | 256K context Codestral | | Gemma | 3, 2, CodeGemma, ShieldGemma | Vision support, 128K context | | Phi | 4, 3.5, 3 | Microsoft models, 128K context | | IBM | Granite 3.x, Granite Code | MoE variants, 128K context | | Cohere | Command R, Aya, Aya Expanse | Multilingual, 128K context | | GPT-OSS | 120B, 20B (Cloud) | Cloud-hosted OSS models | | + 30+ more | Yi, Falcon, GLM, InternLM, SOLAR, etc. | See full list in source |

Installation

# Install via pi
pi install npm:pi-ollama-api

# Or install locally
pi install npm:pi-ollama-api -l

Setup

Get an API key from ollama.com/settings
Start Pi and run:
```
/ollama-cloud-login
```
Paste your API key when prompted. It is stored in Pi's ~/.pi/agent/auth.json (same place as /login credentials).
Select a model with /model → pick any ollama-cloud/* model

Authentication

| Method | How | Where stored | |--------|-----|-------------| | Pi /login (recommended) | Run /login in Pi → select "Use an API key" | ~/.pi/agent/auth.json | | Environment variable | export OLLAMA_API_KEY=... | Shell env |

Pi's AuthStorage is used natively — API keys are checked in auth.json first, then the env var is used as a fallback.

Environment Variables

| Variable | Default | Description | |----------|---------|-------------| | OLLAMA_API_KEY | — | Fallback API key (used if auth.json has no key) | | OLLAMA_CLOUD_BASE_URL | https://ollama.com/v1 | Override endpoint (for proxies or self-hosted) | | OLLAMA_CLOUD_MODELS | — | Comma-separated list to skip discovery and use static models | | OLLAMA_CLOUD_TIMEOUT | 30000 | Model discovery timeout in ms |

Usage

Select a model

/model

Then pick any ollama-cloud/* model. Examples:

ollama-cloud/llama3.3 — Llama 3.3 70B
ollama-cloud/qwen3 — Qwen 3 with vision
ollama-cloud/deepseek-r1 — DeepSeek R1 with reasoning
ollama-cloud/gemma3:27b — Gemma 3 27B with vision

Commands

| Command | Description | |---------|-------------| | /ollama-cloud-status | Check API key status and model count | | /ollama-cloud-refresh | Re-fetch live model list from Ollama Cloud API | | /ollama-cloud-list | Pretty-print all models with 🧠/🖼️/💬 badges | | /ollama-cloud-pull <id> | Show the ollama pull command for a model |

Tools (LLM-callable)

| Tool | Purpose | |------|---------| | ollama_list_models | Filter models by family, vision, or reasoning | | ollama_embeddings | Generate embeddings via /v1/embeddings | | ollama_chat | Direct chat completion via /v1/chat/completions | | ollama_model_info | Get detailed metadata for a specific model |

Quick Examples

# Check what models are available
Use ollama_list_models to show all available models

# Get embeddings for a document
Use ollama_embeddings with model "nomic-embed-text" and input "The quick brown fox"

# Compare model outputs
Use ollama_chat with model "llama3.3" and messages [{role: "user", content: "Hello"}]
Use ollama_chat with model "qwen3" and messages [{role: "user", content: "Hello"}]

API Compatibility

This extension uses Ollama's OpenAI-compatible API (/v1/chat/completions), which supports:

Chat completions with streaming
Vision (multimodal) inputs
Tool calling
JSON mode
Reasoning/thinking control
Embeddings (/v1/embeddings)

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

pi-ollama-api

Features

Supported Model Families

Installation

Setup

Authentication

Environment Variables

Usage

Select a model

Commands

Tools (LLM-callable)

Quick Examples

API Compatibility

License