model-pool

v0.2.0

Published

4 days ago

Free-first model routing with transparent fallback, local model support, and respect for provider limits.

0High
0Medium
0Low

justn-hyeok

ModelPool

Free-first model routing with transparent fallback, local model support, and respect for provider limits.

Why ModelPool?

ModelPool is designed for AI agents and applications that need to maximize free/low-cost model usage, respect provider limits, and protect user privacy. It routes requests to local or remote models based on policy, provider health, and privacy requirements, always preferring free or local options when possible.

Features

Free-first routing: always tries free/low-cost providers first for eligible requests
Transparent fallback: all fallback decisions are visible and explainable
Local model support: privacy-sensitive or secret data is routed to local models (Ollama) by default
Strict provider limit respect: no quota bypass, no hidden retries, no key/account rotation
Configurable profiles and routing policies
OpenAI-compatible gateway and CLI
Experimental/configurable OpenCode Go/Zen support (v0.1)

Quickstart

Install dependencies and build the local CLI:
```
pnpm install
pnpm build
```
Initialize config:
```
pnpm modelpool init
```
Edit .modelpool/config.yaml for one of the live paths below.
Start the gateway server:
```
pnpm modelpool serve --port 4545
```

Send your first request through the routing alias:

curl -s http://127.0.0.1:4545/v1/chat/completions \
  -H 'content-type: application/json' \
  -d '{"model":"modelpool/free","messages":[{"role":"user","content":"Reply with: hello from modelpool"}],"max_tokens":80}'

modelpool/free means "let ModelPool choose from the active profile". v0.2 also exposes modelpool/fast, modelpool/balanced, and modelpool/capable for health-aware route groups. Use a concrete provider model ID only when you want exact-model routing.

Fastest Live Setup: Groq

Use this when you want a cloud live test without running local models.

Export a Groq key in your shell:
```
export GROQ_API_KEY=...
```

Use a config like this:

ledgerPath: .modelpool/ledger.sqlite
server:
  port: 4545
providers:
  groq:
    enabled: true
    apiKey: ${GROQ_API_KEY}
    models:
      - llama-3.3-70b-versatile
profiles:
  default:
    description: Groq live profile
    providers:
      - groq
    model: llama-3.3-70b-versatile
    fallbackModels: []
routing:
  defaultProfile: default
  maxAttempts: 1
privacy:
  sensitivity: public
  allowLogging: false
modelRegistry:
  - id: llama-3.3-70b-versatile
    name: Llama 3.3 70B Versatile
    provider: groq
    capabilities:
      - chat
    experimental: false

Verify the route before sending prompts:

pnpm modelpool route explain --model modelpool/free --privacy public
pnpm modelpool run --model modelpool/free "Reply with exactly: live-ok"

OpenCode Zen Live Setup

OpenCode Zen is configurable/experimental in v0.1, but the OpenAI-compatible path verified during development is https://opencode.ai/zen/v1.

Export an OpenCode key in your shell:
```
export OPENCODE_API_KEY=...
```

Use a config like this:

ledgerPath: .modelpool/ledger.sqlite
server:
  port: 4545
providers:
  opencode:
    enabled: true
    experimental: true
    apiKey: ${OPENCODE_API_KEY}
    baseUrl: https://opencode.ai/zen/v1
    models:
      - big-pickle
profiles:
  default:
    description: OpenCode Zen live profile
    providers:
      - opencode
    model: big-pickle
    fallbackModels: []
routing:
  defaultProfile: default
  maxAttempts: 1
privacy:
  sensitivity: public
  allowLogging: false
modelRegistry:
  - id: big-pickle
    name: Big Pickle
    provider: opencode
    capabilities:
      - chat
    experimental: true

Use enough output budget for reasoning-capable models through the HTTP API:

curl -s http://127.0.0.1:4545/v1/chat/completions \
  -H 'content-type: application/json' \
  -d '{"model":"modelpool/free","messages":[{"role":"user","content":"Reply with exactly: opencode-ok"}],"max_tokens":80,"temperature":0}'

Note: reasoning-capable OpenCode Zen models can spend early tokens on reasoning before assistant content appears. If max_tokens is too low, the upstream response may contain reasoning but no visible assistant text.

Ollama Local Setup

Use this when prompts are private/sensitive or you want local-only behavior.

Install and run Ollama, then pull a model:
```
ollama serve
ollama pull llama3.2
```

Keep the default local-first config generated by pnpm modelpool init, or ensure your profile uses Ollama:

providers:
  ollama:
    enabled: true
    baseUrl: http://127.0.0.1:11434
    models:
      - llama3.2
profiles:
  default:
    providers:
      - ollama
    model: llama3.2
    fallbackModels: []
privacy:
  sensitivity: private
  allowLogging: false

Run:

pnpm modelpool run "Reply with exactly: local-ok"

Configuration

Main config: .modelpool/config.yaml
Ledger (usage log): .modelpool/ledger.sqlite
Override config/ledger paths with MODELPOOL_CONFIG and MODELPOOL_LEDGER env vars
Keep provider keys in environment variables; do not write raw keys into committed config files
CLI/server commands, run locally with pnpm modelpool ... after pnpm build:
- modelpool init [--config path] [--force]
- modelpool serve [--port number] [--config path]
- modelpool run [--model id] [--prompt text] <prompt>
- modelpool status
- modelpool models
- modelpool doctor [--probe]
- modelpool usage [--json]
- modelpool route explain [--model id] [--privacy public|private|sensitive|secret]
- modelpool scan <file>

Profiles

Profiles define routing order and fallback for different use cases:

default: Local-first with cloud fallback (Ollama -> Groq)
public: Free/low-cost providers first (OpenCode Go/Zen -> Groq -> Ollama)
private/sensitive/secret: Local only (Ollama), unless explicitly allowed

Supported Providers (MVP)

Ollama (local, default for private/sensitive/secret)
Groq (cloud, free/low-cost, OpenAI-compatible)
OpenCode Go/Zen (configurable/experimental in v0.1; verified against https://opencode.ai/zen/v1 with configured model IDs)

Note: OpenCode Go/Zen support remains experimental/configurable in v0.1. Model availability and response behavior can vary by account and model. Do not assume hardcoded OpenCode model IDs outside your own config.

Privacy Policy

Private, sensitive, or secret data is routed to local models (Ollama) by default
No prompt/completion content or credentials are stored in the ledger
No external provider fallback for secret-classified requests
modelpool scan <file> and POST /v1/modelpool/policy/check redact supported secret patterns before returning findings
Local privacy support is a first-order product goal

Secret Scanner

The v0.1 scanner is deterministic and regex-based. It detects and redacts .env-style key assignments, OpenAI-style keys, GitHub tokens, JWTs, AWS access key IDs, and SSH/private key blocks. Findings expose redacted matches plus location metadata only; raw matched secret values are not returned. Known limitation: this is not full DLP or entropy analysis, so unusual credential formats may require explicit policy classification.

Server Mode & Endpoints

Start with modelpool serve (default port 4545)
Endpoints:
- GET /v1/models
- POST /v1/chat/completions
- GET /v1/modelpool/status
- POST /v1/modelpool/route/explain
- POST /v1/modelpool/policy/check
Unsupported OpenAI endpoints return 501 Not Implemented
Streaming is not supported in v0.1 (returns 501)

Route Explain and Usage

Use modelpool route explain or POST /v1/modelpool/route/explain to see provider selection, fallback reasons, and policy decisions for a given request
Use modelpool/free, modelpool/fast, modelpool/balanced, or modelpool/capable when you want ModelPool to choose from the active profile instead of requiring an exact provider model ID
Use modelpool usage --json for privacy-safe aggregate metadata; prompts, completions, credentials, headers, and request bodies are not stored
modelpool doctor --probe is opt-in and can consume provider quota; doctor without --probe reports configuration and passive health only

Roadmap

v0.1: MVP with Ollama, Groq, and experimental OpenCode Go/Zen support
v0.2: No-Ink free-first routing UX with route groups, passive health, cooldowns, and usage metadata
v0.3+: Streaming, Anthropic-compatible endpoint if useful, expanded provider support if verified, and advanced policy/routing

Non-goals & Forbidden Behaviors

No OpenRouter, Anthropic, Gemini, Fireworks, DeepInfra, or other non-MVP providers
No dashboard, billing UI, teams, or LiteLLM replacement
No quota bypass, unlimited access, key/account rotation, or hidden retries
No claims of unlimited free usage or provider-limit evasion

For provider setup and policy details, see docs/provider-setup.md and docs/policy.md.