pi-ollama-api
v1.3.0
Published
Ollama Cloud provider extension for Pi — connect to Ollama Cloud models via the OpenAI-compatible API
Downloads
590
Maintainers
Readme
pi-ollama-api
Ollama Cloud provider extension for Pi — connect your terminal coding agent to 200+ models on Ollama Cloud via the OpenAI-compatible API.
Features
- Native Ollama API discovery — Queries
/api/tagsand/api/showfor real model metadata (context windows, capabilities, parameter sizes, quantization) - Actual context windows — No hardcoded defaults. Every model reports its real context length from Ollama's API (e.g., 1M for DeepSeek V4, 262K for Kimi K2, 128K for GPT-OSS)
- Capability detection — Vision, reasoning, tools detected from Ollama's
capabilitiesarray - OpenAI-compatible API — Uses
openai-completionsstreaming (works with all Pi features) - Embeddings tool — Generate embeddings via
/v1/embeddingsfor RAG and similarity search - Direct chat tool — Send one-off completions for model comparison or testing
Supported Model Families
| Family | Models | Highlights | |--------|--------|------------| | Llama | 3.3, 3.2, 3.1, 3, 2 | 70B frontier, Vision variants, 405B | | Qwen | 3, 2.5, 2, VL, Coder, Math | 128K context, Vision, Code, Math variants | | DeepSeek | R1, V3, V2, Coder V2 | Reasoning (R1), 671B total | | Mistral | Codestral, Mistral, Nemo, Large, Mixtral | 256K context Codestral | | Gemma | 3, 2, CodeGemma, ShieldGemma | Vision support, 128K context | | Phi | 4, 3.5, 3 | Microsoft models, 128K context | | IBM | Granite 3.x, Granite Code | MoE variants, 128K context | | Cohere | Command R, Aya, Aya Expanse | Multilingual, 128K context | | GPT-OSS | 120B, 20B (Cloud) | Cloud-hosted OSS models | | + 30+ more | Yi, Falcon, GLM, InternLM, SOLAR, etc. | See full list in source |
Installation
# Install via pi
pi install npm:pi-ollama-api
# Or install locally
pi install npm:pi-ollama-api -lSetup
- Get an API key from ollama.com/settings
- Start Pi and run:
Paste your API key when prompted. It is stored in Pi's/ollama-cloud-login~/.pi/agent/auth.json(same place as/logincredentials). - Select a model with
/model→ pick anyollama-cloud/*model
Authentication
| Method | How | Where stored |
|--------|-----|-------------|
| Pi /login (recommended) | Run /login in Pi → select "Use an API key" | ~/.pi/agent/auth.json |
| Environment variable | export OLLAMA_API_KEY=... | Shell env |
Pi's AuthStorage is used natively — API keys are checked in auth.json first, then the env var is used as a fallback.
Environment Variables
| Variable | Default | Description |
|----------|---------|-------------|
| OLLAMA_API_KEY | — | Fallback API key (used if auth.json has no key) |
| OLLAMA_CLOUD_BASE_URL | https://ollama.com/v1 | Override endpoint (for proxies or self-hosted) |
| OLLAMA_CLOUD_MODELS | — | Comma-separated list to skip discovery and use static models |
| OLLAMA_CLOUD_TIMEOUT | 30000 | Model discovery timeout in ms |
Usage
Select a model
/modelThen pick any ollama-cloud/* model. Examples:
ollama-cloud/llama3.3— Llama 3.3 70Bollama-cloud/qwen3— Qwen 3 with visionollama-cloud/deepseek-r1— DeepSeek R1 with reasoningollama-cloud/gemma3:27b— Gemma 3 27B with vision
Commands
| Command | Description |
|---------|-------------|
| /ollama-cloud-status | Check API key status and model count |
| /ollama-cloud-refresh | Re-fetch live model list from Ollama Cloud API |
| /ollama-cloud-list | Pretty-print all models with 🧠/🖼️/💬 badges |
| /ollama-cloud-pull <id> | Show the ollama pull command for a model |
Tools (LLM-callable)
| Tool | Purpose |
|------|---------|
| ollama_list_models | Filter models by family, vision, or reasoning |
| ollama_embeddings | Generate embeddings via /v1/embeddings |
| ollama_chat | Direct chat completion via /v1/chat/completions |
| ollama_model_info | Get detailed metadata for a specific model |
Quick Examples
# Check what models are available
Use ollama_list_models to show all available models
# Get embeddings for a document
Use ollama_embeddings with model "nomic-embed-text" and input "The quick brown fox"
# Compare model outputs
Use ollama_chat with model "llama3.3" and messages [{role: "user", content: "Hello"}]
Use ollama_chat with model "qwen3" and messages [{role: "user", content: "Hello"}]API Compatibility
This extension uses Ollama's OpenAI-compatible API (/v1/chat/completions), which supports:
- Chat completions with streaming
- Vision (multimodal) inputs
- Tool calling
- JSON mode
- Reasoning/thinking control
- Embeddings (
/v1/embeddings)
License
MIT
