model-pool
v0.2.0
Published
Free-first model routing with transparent fallback, local model support, and respect for provider limits.
Readme
ModelPool
Free-first model routing with transparent fallback, local model support, and respect for provider limits.
Why ModelPool?
ModelPool is designed for AI agents and applications that need to maximize free/low-cost model usage, respect provider limits, and protect user privacy. It routes requests to local or remote models based on policy, provider health, and privacy requirements, always preferring free or local options when possible.
Features
- Free-first routing: always tries free/low-cost providers first for eligible requests
- Transparent fallback: all fallback decisions are visible and explainable
- Local model support: privacy-sensitive or secret data is routed to local models (Ollama) by default
- Strict provider limit respect: no quota bypass, no hidden retries, no key/account rotation
- Configurable profiles and routing policies
- OpenAI-compatible gateway and CLI
- Experimental/configurable OpenCode Go/Zen support (v0.1)
Quickstart
- Install dependencies and build the local CLI:
pnpm install pnpm build - Initialize config:
pnpm modelpool init - Edit
.modelpool/config.yamlfor one of the live paths below. - Start the gateway server:
pnpm modelpool serve --port 4545 - Send your first request through the routing alias:
curl -s http://127.0.0.1:4545/v1/chat/completions \ -H 'content-type: application/json' \ -d '{"model":"modelpool/free","messages":[{"role":"user","content":"Reply with: hello from modelpool"}],"max_tokens":80}'
modelpool/free means "let ModelPool choose from the active profile". v0.2 also exposes modelpool/fast, modelpool/balanced, and modelpool/capable for health-aware route groups. Use a concrete provider model ID only when you want exact-model routing.
Fastest Live Setup: Groq
Use this when you want a cloud live test without running local models.
- Export a Groq key in your shell:
export GROQ_API_KEY=... - Use a config like this:
ledgerPath: .modelpool/ledger.sqlite server: port: 4545 providers: groq: enabled: true apiKey: ${GROQ_API_KEY} models: - llama-3.3-70b-versatile profiles: default: description: Groq live profile providers: - groq model: llama-3.3-70b-versatile fallbackModels: [] routing: defaultProfile: default maxAttempts: 1 privacy: sensitivity: public allowLogging: false modelRegistry: - id: llama-3.3-70b-versatile name: Llama 3.3 70B Versatile provider: groq capabilities: - chat experimental: false - Verify the route before sending prompts:
pnpm modelpool route explain --model modelpool/free --privacy public pnpm modelpool run --model modelpool/free "Reply with exactly: live-ok"
OpenCode Zen Live Setup
OpenCode Zen is configurable/experimental in v0.1, but the OpenAI-compatible path verified during development is https://opencode.ai/zen/v1.
- Export an OpenCode key in your shell:
export OPENCODE_API_KEY=... - Use a config like this:
ledgerPath: .modelpool/ledger.sqlite server: port: 4545 providers: opencode: enabled: true experimental: true apiKey: ${OPENCODE_API_KEY} baseUrl: https://opencode.ai/zen/v1 models: - big-pickle profiles: default: description: OpenCode Zen live profile providers: - opencode model: big-pickle fallbackModels: [] routing: defaultProfile: default maxAttempts: 1 privacy: sensitivity: public allowLogging: false modelRegistry: - id: big-pickle name: Big Pickle provider: opencode capabilities: - chat experimental: true - Use enough output budget for reasoning-capable models through the HTTP API:
curl -s http://127.0.0.1:4545/v1/chat/completions \ -H 'content-type: application/json' \ -d '{"model":"modelpool/free","messages":[{"role":"user","content":"Reply with exactly: opencode-ok"}],"max_tokens":80,"temperature":0}'
Note: reasoning-capable OpenCode Zen models can spend early tokens on reasoning before assistant content appears. If max_tokens is too low, the upstream response may contain reasoning but no visible assistant text.
Ollama Local Setup
Use this when prompts are private/sensitive or you want local-only behavior.
- Install and run Ollama, then pull a model:
ollama serve ollama pull llama3.2 - Keep the default local-first config generated by
pnpm modelpool init, or ensure your profile uses Ollama:providers: ollama: enabled: true baseUrl: http://127.0.0.1:11434 models: - llama3.2 profiles: default: providers: - ollama model: llama3.2 fallbackModels: [] privacy: sensitivity: private allowLogging: false - Run:
pnpm modelpool run "Reply with exactly: local-ok"
Configuration
- Main config:
.modelpool/config.yaml - Ledger (usage log):
.modelpool/ledger.sqlite - Override config/ledger paths with
MODELPOOL_CONFIGandMODELPOOL_LEDGERenv vars - Keep provider keys in environment variables; do not write raw keys into committed config files
- CLI/server commands, run locally with
pnpm modelpool ...afterpnpm build:modelpool init [--config path] [--force]modelpool serve [--port number] [--config path]modelpool run [--model id] [--prompt text] <prompt>modelpool statusmodelpool modelsmodelpool doctor [--probe]modelpool usage [--json]modelpool route explain [--model id] [--privacy public|private|sensitive|secret]modelpool scan <file>
Profiles
Profiles define routing order and fallback for different use cases:
- default: Local-first with cloud fallback (Ollama -> Groq)
- public: Free/low-cost providers first (OpenCode Go/Zen -> Groq -> Ollama)
- private/sensitive/secret: Local only (Ollama), unless explicitly allowed
Supported Providers (MVP)
- Ollama (local, default for private/sensitive/secret)
- Groq (cloud, free/low-cost, OpenAI-compatible)
- OpenCode Go/Zen (configurable/experimental in v0.1; verified against
https://opencode.ai/zen/v1with configured model IDs)
Note: OpenCode Go/Zen support remains experimental/configurable in v0.1. Model availability and response behavior can vary by account and model. Do not assume hardcoded OpenCode model IDs outside your own config.
Privacy Policy
- Private, sensitive, or secret data is routed to local models (Ollama) by default
- No prompt/completion content or credentials are stored in the ledger
- No external provider fallback for secret-classified requests
modelpool scan <file>andPOST /v1/modelpool/policy/checkredact supported secret patterns before returning findings- Local privacy support is a first-order product goal
Secret Scanner
The v0.1 scanner is deterministic and regex-based. It detects and redacts .env-style key assignments, OpenAI-style keys, GitHub tokens, JWTs, AWS access key IDs, and SSH/private key blocks. Findings expose redacted matches plus location metadata only; raw matched secret values are not returned. Known limitation: this is not full DLP or entropy analysis, so unusual credential formats may require explicit policy classification.
Server Mode & Endpoints
- Start with
modelpool serve(default port 4545) - Endpoints:
GET /v1/modelsPOST /v1/chat/completionsGET /v1/modelpool/statusPOST /v1/modelpool/route/explainPOST /v1/modelpool/policy/check
- Unsupported OpenAI endpoints return 501 Not Implemented
- Streaming is not supported in v0.1 (returns 501)
Route Explain and Usage
- Use
modelpool route explainorPOST /v1/modelpool/route/explainto see provider selection, fallback reasons, and policy decisions for a given request - Use
modelpool/free,modelpool/fast,modelpool/balanced, ormodelpool/capablewhen you want ModelPool to choose from the active profile instead of requiring an exact provider model ID - Use
modelpool usage --jsonfor privacy-safe aggregate metadata; prompts, completions, credentials, headers, and request bodies are not stored modelpool doctor --probeis opt-in and can consume provider quota;doctorwithout--probereports configuration and passive health only
Roadmap
- v0.1: MVP with Ollama, Groq, and experimental OpenCode Go/Zen support
- v0.2: No-Ink free-first routing UX with route groups, passive health, cooldowns, and usage metadata
- v0.3+: Streaming, Anthropic-compatible endpoint if useful, expanded provider support if verified, and advanced policy/routing
Non-goals & Forbidden Behaviors
- No OpenRouter, Anthropic, Gemini, Fireworks, DeepInfra, or other non-MVP providers
- No dashboard, billing UI, teams, or LiteLLM replacement
- No quota bypass, unlimited access, key/account rotation, or hidden retries
- No claims of unlimited free usage or provider-limit evasion
For provider setup and policy details, see docs/provider-setup.md and docs/policy.md.
