@vtstech/pi-model-test
v1.2.0
Published
Model benchmark/testing extension for Pi Coding Agent
Readme
@vtstech/pi-model-test
Model benchmark extension for the Pi Coding Agent.
Test any model for reasoning, tool usage, and instruction following — works with Ollama and cloud providers.
Install
pi install "npm:@vtstech/pi-model-test"Commands
/model-test Test current Pi model (auto-detects provider)
/model-test qwen3:0.6b Test a specific Ollama model
/model-test --all Test every Ollama modelTest Suites
Ollama (6 tests)
| Test | Scoring | |------|---------| | Reasoning (snail puzzle) | STRONG / MODERATE / WEAK / FAIL | | Thinking token support | SUPPORTED / NOT SUPPORTED | | Tool usage (native + text) | STRONG / MODERATE / WEAK / FAIL | | ReAct parsing | STRONG / MODERATE / WEAK / FAIL | | Instruction following (JSON) | STRONG / MODERATE / WEAK / FAIL | | Tool support detection | NATIVE / REACT / NONE |
Cloud Providers (4 tests)
| Test | Scoring | |------|---------| | Connectivity | OK / FAIL | | Reasoning | STRONG / MODERATE / WEAK / FAIL | | Instruction following | STRONG / MODERATE / WEAK / FAIL | | Tool usage (function calling) | STRONG / MODERATE / WEAK / FAIL |
Features
- Auto-detects Ollama vs cloud provider (OpenRouter, Anthropic, Google, OpenAI, Groq, DeepSeek, Mistral, xAI, Together, Fireworks, Cohere)
- Uses native
fetch()for all HTTP communication (no shell subprocess or curl dependency) - Streaming Ollama chat — uses
/api/chatwithstream: truefor earlier timeout detection and reduced memory - Automatic remote Ollama URL resolution (reads from
models.jsonon every call — picks up config changes immediately) - Timeout resilience with exponential backoff retry on connection failures
- Configurable test parameters — override timeouts, delays, temperature via
~/.pi/agent/model-test-config.json - Test history with regression detection — tracks results at
~/.pi/agent/cache/model-test-history.json, flags score degradation - Rate limit delay between tests (configurable)
- Thinking model fallback (retries with
think: true) - Tool support cache (
~/.pi/agent/cache/tool_support.json) - JSON repair for truncated output (stack-based nesting-aware parser)
- Tab-completion for model names
Links
License
MIT — VTSTech
