@vtstech/pi-model-test

v1.3.3

Published

a month ago

Model benchmark/testing extension for Pi Coding Agent

0High
0Medium
0Low

vtstech

pi-package pi pi-coding-agent pi-extensions

@vtstech/pi-model-test

Model benchmark extension for the Pi Coding Agent.

Test any model for reasoning, tool usage, and instruction following — works with Ollama and cloud providers.

# Install as part of the bundle
pi install git:github.com/VTSTech/pi-coding-agent

# Or install individually
pi install "npm:@vtstech/pi-model-test"

Commands

/model-test                     Test current Pi model (auto-detects provider)
/model-test qwen3:0.6b          Test a specific Ollama model
/model-test --all               Test every Ollama model

Test Suites

Ollama (6 tests)

| Test | Scoring | |------|---------| | Reasoning (snail puzzle) | STRONG / MODERATE / WEAK / FAIL | | Thinking token support | SUPPORTED / NOT SUPPORTED | | Tool usage (native + text) | STRONG / MODERATE / WEAK / FAIL | | ReAct parsing | STRONG / MODERATE / WEAK / FAIL | | Instruction following (JSON) | STRONG / MODERATE / WEAK / FAIL | | Tool support detection | NATIVE / REACT / NONE |

Cloud Providers (4 tests)

| Test | Scoring | |------|---------| | Connectivity | OK / FAIL | | Reasoning | STRONG / MODERATE / WEAK / FAIL | | Instruction following | STRONG / MODERATE / WEAK / FAIL | | Tool usage (function calling) | STRONG / MODERATE / WEAK / FAIL |

Features

Auto-detects Ollama vs cloud provider (OpenRouter, Anthropic, Google, OpenAI, Groq, DeepSeek, Mistral, xAI, Together, Fireworks, Cohere)
Uses native fetch() for all HTTP communication (no shell subprocess or curl dependency)
Streaming Ollama chat — uses /api/chat with stream: true for earlier timeout detection and reduced memory
Automatic remote Ollama URL resolution (reads from models.json on every call — picks up config changes immediately)
Timeout resilience with exponential backoff retry on connection failures
Configurable test parameters — override timeouts, delays, temperature via ~/.pi/agent/model-test-config.json
Test history with regression detection — tracks results at ~/.pi/agent/cache/model-test-history.json, flags score degradation
Rate limit delay between tests (configurable)
Thinking model fallback (retries with think: true)
Tool support cache (~/.pi/agent/cache/tool_support.json)
JSON repair for truncated output (stack-based nesting-aware parser)
Tab-completion for model names

License

MIT — VTSTech

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@vtstech/pi-model-test

Commands

Test Suites

Ollama (6 tests)

Cloud Providers (4 tests)

Features

Links

License