@0xkobold/pi-ollama
v0.5.0
Published
Ollama extension for pi-coding-agent. Unified local + cloud Ollama support with model management
Readme
Pi Ollama Extension
Ollama integration for pi-coding-agent with accurate model details from /api/show.
Changelog
v0.5.0
- Fix: DeepSeek models now return correct context lengths —
deepseek-v4→ 1M tokens, other deepseek → 163,840 tokens (was 4,096). Closes #4. - Fix: Default context fallback raised from 4,096 → 131,072 (128k). Per Ollama docs, cloud models default to their maximum context; 128k is a conservative floor for unknowns.
- Fix:
hasReasoningCapability()no longer flagsinstruct,chat, orcode/codermodels as reasoning. Only models with actual thinking capability (DeepSeek, R1, QwQ, GPT-OSS, Phi) or athinking/reasoncapability from/api/showare marked as reasoning. See Ollama thinking docs. - Fix:
createModel()now setsmaxTokenstomin(contextWindow, 16384)instead of hardcoded 8192.chat()/chatStream()no longer sendmax_tokens: 4096by default — omitted unless explicitly set. - Fix: Cloud model deduplication now properly strips
:cloudsuffix before comparing model names. - Fix:
loadConfigFromSettingsFiles()now uses async dynamicimport()instead ofrequire(), fixing ESM compatibility. - New:
hasReasoningCapability()accepts optionalmodelInfoparameter and checkscapabilitiesarray forthinking/reason. - New: Error classification —
chat()andchatStream()now throw typed errors:OllamaAuthError(401/403),OllamaRateLimitError(429),OllamaModelError(400/404),OllamaServerError(500/502). - New: Request timeout —
chat()andchatStream()now apply a 120s timeout viaAbortController. - New: Exported constants
DEFAULT_CONTEXT_LENGTH(131072),DEFAULT_MAX_TOKENS(8192),DEFAULT_REQUEST_TIMEOUT_MS(120000). - New:
stripProviderPrefix()now exported and tested. - Docs: Comprehensive JSDoc comments referencing Ollama API docs.
- Docs: README updated with all context length tables, error types, and API reference.
v0.4.1
- Fix: Cloud models now correctly use
/v1endpoint. Previously,ollama-cloudwas registered withbaseUrl: "https://ollama.com", causing pi to hithttps://ollama.com/chat/completions(HTML homepage) instead ofhttps://ollama.com/v1/chat/completions. - Fix: Trailing slashes in
cloudUrlconfig are now properly stripped before appending/v1.
Installation
# Via pi CLI
pi install npm:@0xkobold/pi-ollama
# Or in pi-config.ts
{
extensions: [
'npm:@0xkobold/pi-ollama'
]
}
# Or temporary (testing)
pi -e npm:@0xkobold/pi-ollamaFeatures
- 🦙 Local Ollama — Connect to localhost:11434
- ☁️ Ollama Cloud — Use ollama.com with API key
- 📊 Accurate Details — Uses
/api/showfor real context length - 👁️ Vision Detection — Detects vision from
capabilitiesarray - 🧠 Reasoning Detection — Detects thinking models from
capabilitiesand name patterns - 🔍 Model Info — Query specific model parameters
- 🛡️ Error Classification — Typed errors for auth, rate limits, model errors, server errors
- ⏱️ Request Timeouts — 120s default timeout on all HTTP calls
Commands
| Command | Description |
|---------|-------------|
| /ollama-status | Check connection status |
| /ollama-models | List models with context length |
| /ollama-info MODEL | Show model details from /api/show |
| /ollama status\|info\|models | Shortcuts |
How It Works
The extension uses Ollama's /api/show endpoint to get accurate model information:
curl http://localhost:11434/api/show -d '{
"model": "gemma3",
"verbose": true
}'Response includes (per Ollama docs):
model_info.<arch>.context_length— Accurate context windowcapabilities—["completion", "vision", "thinking"]details.parameter_size— "4.3B", "70B", etc.details.family— "gemma3", "llama", etc.
Context Length Resolution
Context length is resolved in this order:
model_info.*.context_length— From/api/show(most accurate)- Top-level keys —
context_length,max_position_embeddings,max_sequence_length,n_ctx - Parameter-size heuristic — Small models → smaller context
- Name-based lookup — For cloud models without
/api/show - Default fallback — 131,072 (128k tokens)
Name-Based Context Length Table
| Model Family | Context Length | Source |
|-------------|---------------|--------|
| deepseek-v4 | 1,048,576 (1M) | Ollama library |
| kimi | 262,144 (256k) | Ollama library |
| qwen3 | 262,144 (256k) | Ollama library |
| minimax | 204,800 (200k) | Ollama library |
| glm | 202,752 (~198k) | Ollama library |
| llama3.1/3.2/3.3 | 128,000 (128k) | Ollama library |
| deepseek (non-v4) | 163,840 (160k) | Ollama library |
| gpt-oss | 128,000 (128k) | Ollama library |
| qwen/qwen2.5 | 32,768 (32k) | Ollama library |
| mistral/mixtral | 32,768 (32k) | Ollama library |
| llama3 | 8,192 | Ollama library |
| Unknown | 131,072 (128k) | Conservative default per Ollama context docs |
Reasoning Capability Detection
Per Ollama thinking docs, reasoning/thinking is detected by:
capabilitiesarray from/api/show— if it includes"thinking"or"reason"- Name-based heuristic (for cloud models):
- ✅ DeepSeek models (have think mode)
- ✅
r1models (word boundary match) - ✅ QwQ, GPT-OSS, Phi
- ✅ Models containing "reason"
- ❌
instruct,chat,code/coder— these are format tags, NOT reasoning
Error Handling
All HTTP calls classify errors into typed classes:
| Class | Status Codes | Meaning |
|-------|-------------|---------|
| OllamaAuthError | 401, 403 | Invalid API key |
| OllamaRateLimitError | 429 | Rate limit exceeded |
| OllamaModelError | 400, 404 | Bad request or model not found |
| OllamaServerError | 500, 502 | Server/gateway error |
| OllamaError | Other | Catch-all |
Per Ollama error docs.
Configuration
Configuration is loaded with the following precedence (highest to lowest):
- Environment variables (override everything)
pi.settings(runtime API, when available).pi/settings.json(project-local settings)~/.pi/agent/settings.json(global user settings)
Environment Variables
export OLLAMA_HOST="http://localhost:11434" # Local base URL
export OLLAMA_HOST_CLOUD="https://ollama.com" # Cloud base URL
export OLLAMA_API_KEY="your-api-key" # Cloud API keySettings File
Add to your global settings (~/.pi/agent/settings.json):
{
"ollama": {
"baseUrl": "http://localhost:11434",
"cloudUrl": "https://ollama.com",
"apiKey": "your-ollama-cloud-api-key"
}
}Per Ollama cloud docs.
API Reference
import {
fetchModelDetails,
getContextLength,
hasVisionCapability,
hasReasoningCapability,
createClients,
classifyHttpError,
OllamaError,
OllamaAuthError,
OllamaRateLimitError,
OllamaModelError,
OllamaServerError,
DEFAULT_CONTEXT_LENGTH,
DEFAULT_MAX_TOKENS,
} from '@0xkobold/pi-ollama/shared';
// Get model details from local Ollama
const details = await fetchModelDetails(client, 'gemma3');
// Extract context length (with name-based fallback)
const ctx = getContextLength(details, 'gemma3'); // 131072
// Check capabilities
const hasVision = hasVisionCapability(details); // true/false
const hasReasoning = hasReasoningCapability('deepseek-r1', details); // true
// Classify HTTP errors
try {
await chat(client, { model: 'gemma3', messages: [...] });
} catch (err) {
if (err instanceof OllamaRateLimitError) {
// Handle rate limit
} else if (err instanceof OllamaAuthError) {
// Handle auth failure
}
}License
MIT © 0xKobold
