@iistools/provider
v0.2.0
Published
AI provider router — smart routing across local and cloud LLM/TTS/embedding providers with account-aware metering
Downloads
16
Readme
@iistools/provider
AI provider router — smart routing across local and cloud LLM/TTS/embedding providers with account-aware metering.
What it does
Routes AI requests to the best available provider based on health, latency, cost, and capability. Supports:
- Local inference: Ollama, vLLM, llama.cpp server, LM Studio
- Cloud APIs: OpenAI, Anthropic, Google Gemini, ElevenLabs
- Free fallbacks: Edge TTS (Microsoft, local binary), Whisper local transcription
- IIS managed pools: Shared API keys via IIS account service
Requests fall through the tier hierarchy automatically. If Ollama is running locally, it gets picked first. If it goes down mid-session, the next healthy provider takes over without any intervention.
Quick Start
First-time setup
# Install globally
npm install -g @iistools/provider
# Interactive setup — discovers local providers, offers Ollama bootstrap if none found
iis-provider init
# Bootstrap Ollama (free, local, pulls qwen3.5:4b + qwen3-embedding:0.6b, ~4GB)
iis-provider init --setup-ollama
# Configure with a cloud key (no local inference required)
iis-provider init --openai-key=sk-...
# Multiple keys at once
iis-provider init --openai-key=sk-... --anthropic-key=sk-ant-... --gemini-key=AI...
# Point to an existing Ollama server on the LAN
iis-provider init --ollama-url=http://your-server:11434
# No AI needed (for non-AI iistools features)
iis-provider init --no-aiStart the daemon
iis-provider daemon
# iis-provider daemon listening on 127.0.0.1:9665Route a request
# Completion
curl -X POST http://localhost:9665/v1/route \
-H 'Content-Type: application/json' \
-d '{
"task": "completion",
"input": {
"messages": [{"role": "user", "content": "What is 2+2?"}]
}
}'
# Via OpenAI-compatible endpoint (works with any OpenAI SDK)
curl -X POST http://localhost:9665/v1/chat/completions \
-H 'Content-Type: application/json' \
-d '{
"model": "auto",
"messages": [{"role": "user", "content": "Hello"}]
}'Usage
As a library (Node.js)
import {
Router,
ProviderRegistry,
HealthChecker,
CacheManager,
UsageMeter,
AccountClient,
createLogger,
registerBuiltinAdapters,
getAdapter,
loadProviderConfig,
resolveProviderConfig,
} from '@iistools/provider';
// Load and resolve config (env var overrides applied)
const config = resolveProviderConfig(loadProviderConfig());
// Register built-in adapters (openai, anthropic, ollama, etc.)
registerBuiltinAdapters();
// Build the registry and router
const registry = new ProviderRegistry(config);
const log = createLogger(config.log_level ?? 'info', 'my-service');
const router = new Router(registry, getAdapter, log, config.routing ?? {});
// Route a request
const result = await router.route({
task: 'completion',
input: {
messages: [{ role: 'user', content: 'Summarize this document...' }],
},
});
console.log(result.result.content); // response text
console.log(result.provider); // which provider handled it
console.log(result.routing.latency_ms); // round-trip latencyStreaming completions
for await (const chunk of router.routeStream({
task: 'completion',
input: { messages: [{ role: 'user', content: 'Write a poem about...' }] },
})) {
process.stdout.write(chunk.delta ?? '');
if (chunk.done) break;
}Embeddings and TTS
// Embeddings (cached automatically)
const embed = await router.route({
task: 'embedding',
input: { input: 'Text to embed' },
});
const vector = embed.result.embedding; // number[]
// Text-to-speech (cached automatically — identical text+voice hits cache)
const tts = await router.route({
task: 'tts',
input: { text: 'Hello world', voice: 'alloy' },
});
const audioBase64 = tts.result.audio; // string (base64 MP3)Provider preferences
// Pin to a specific provider
await router.route({
task: 'completion',
input: { messages: [...] },
preferences: { provider: 'openai' },
});
// Force local-only (fail if no local provider available)
await router.route({
task: 'completion',
input: { messages: [...] },
preferences: { prefer_local: true },
});
// Cost ceiling (skip pooled providers if credits exceed limit)
await router.route({
task: 'completion',
input: { messages: [...] },
preferences: { max_cost_credits: 5 },
});
// Quality hint (prefer providers strong in code tasks)
await router.route({
task: 'completion',
input: { messages: [...] },
preferences: { quality: 'code' },
});As a daemon (HTTP API)
iis-provider daemon # Start on 127.0.0.1:9665
iis-provider daemon --config=~/.config/iis/provider/providers.yaml
iis-provider status # Daemon health check
iis-provider providers # List providers with health
iis-provider providers --output json # Machine-readable
iis-provider route completion # Show routing priority order
iis-provider route embedding # Which providers handle embeddings?
iis-provider usage # All-time usage summary
iis-provider usage --since=2026-01-01 # Filter by date
iis-provider discover # Scan LAN for AI providersAs an MCP server (Claude Code)
Add to your Claude Code MCP config (~/.claude/mcp.json or project .claude/settings.json):
{
"mcpServers": {
"provider": {
"command": "iis-provider",
"args": ["mcp"]
}
}
}This exposes 8 tools directly to Claude:
| Tool | Description |
|------|-------------|
| provider_complete | Send a completion request through the router |
| provider_embed | Generate text embeddings |
| provider_transcribe | Transcribe an audio file (wav, mp3, m4a, flac, ogg, webm) |
| provider_speak | Text-to-speech — saves audio to a file |
| provider_imagine | Image generation — saves image to a file |
| provider_route | Dry-run: show routing order for a task type |
| provider_providers | List configured providers with health and capabilities |
| provider_status | Daemon status (uptime, provider health, cache stats) |
Configuration
Config file path (first found wins):
~/.config/iis/provider/providers.yaml ← provider-specific
~/.config/iis/nexus/providers.yaml ← shared with nexus daemonFull configuration reference:
# Provider definitions
providers:
# Local Ollama (auto-discovered)
ollama:
type: ollama
endpoint: http://localhost:11434
capabilities: [completion, embedding]
priority: 10 # Lower = preferred
free: true # No credits charged
# LAN inference server (OpenAI-compatible endpoint)
lan-gpu:
type: openai-compatible
endpoint: http://your-gpu-server:8080
capabilities: [completion]
priority: 15
free: true
# OpenAI with your own API key
openai:
type: openai
endpoint: https://api.openai.com
key: secret:openai # References secrets.yaml
capabilities: [completion, embedding, tts, transcription, image_generation]
priority: 50
models:
completion: [gpt-4o, gpt-4o-mini]
rate_limit:
requests_per_minute: 60
# Anthropic Claude
anthropic:
type: anthropic
endpoint: https://api.anthropic.com
key: secret:anthropic
capabilities: [completion]
priority: 55
# Google Gemini (via OpenAI-compatible endpoint)
gemini:
type: openai-compatible
endpoint: https://generativelanguage.googleapis.com/v1beta/openai
key: secret:gemini
capabilities: [completion, embedding]
priority: 60
# Free TTS fallback (requires edge-tts binary on PATH)
edge-tts:
type: edge-tts
capabilities: [tts]
priority: 90
free: true
# Local Whisper transcription
whisper-local:
type: whisper-local
capabilities: [transcription]
priority: 20
free: true
# ElevenLabs TTS
elevenlabs:
type: elevenlabs
key: secret:elevenlabs
capabilities: [tts]
priority: 70
# IIS managed/pooled provider
iis-pool:
type: openai
endpoint: https://api.iis.tools/v1
key: managed # Uses IIS account credits
capabilities: [completion, embedding]
priority: 80
# Routing behavior
routing:
default_preference: local # local | cloud | cost
fallback_retries: 2 # Retry on provider failure
load_balance: round-robin # none | round-robin | weighted
# Cache (TTS and embeddings only)
cache:
enabled: true
max_size_mb: 1000
path: ~/.cache/iis/provider # Optional override
ttl:
tts: 30d
embedding: 7d
# IIS account service (for managed/pooled providers)
account:
url: https://account.iis.tools # Default
service_key: iis_... # Or set IIS_SERVICE_KEY env var
# Daemon bind/port (standalone mode only)
bind: 127.0.0.1
port: 9665
log_level: info # error | warn | info | debug
require_auth: falseSecrets file (~/.config/iis/provider/secrets.yaml):
openai: sk-...
anthropic: sk-ant-...
gemini: AI...
elevenlabs: el_...Secrets can also be referenced via environment variables in providers.yaml:
key: env:OPENAI_API_KEY # reads process.env.OPENAI_API_KEY
key: secret:openai # reads from secrets.yaml
key: managed # uses IIS account creditsProvider Types
| Type | Adapter | Examples | Capabilities |
|------|---------|----------|--------------|
| openai | OpenAI API | OpenAI, Azure OpenAI | completion, embedding, tts, transcription, image_generation |
| anthropic | Anthropic API | Claude 3.x | completion |
| ollama | Ollama REST | Local Ollama | completion, embedding |
| openai-compatible | OpenAI-compat shim | Gemini, vLLM, LM Studio, llama.cpp | completion, embedding |
| vllm | vLLM native | vLLM inference server | completion |
| llama-server | llama.cpp server | llama.cpp HTTP server | completion |
| edge-tts | Edge TTS binary | Microsoft TTS (free, offline) | tts |
| elevenlabs | ElevenLabs API | ElevenLabs | tts |
| whisper-local | Whisper binary | Local Whisper | transcription |
Routing Tiers
Providers are ranked within each request. Tier is inferred from configuration:
| Tier | Priority | How it's detected |
|------|----------|-------------------|
| 1 — Local | Highest | Endpoint is localhost or LAN IP, or free: true |
| 2 — Cloud Own Key | Medium | Cloud endpoint + key: secret:* or key: env:* |
| 3 — Pooled | Lower | key: managed |
| 4 — Free Fallback | Lowest | free: true + cloud-independent (edge-tts, whisper-local) |
Within a tier, providers are ordered by priority value (lower = preferred), then by health state, then latency.
Health State Machine
Each provider runs through a state machine based on consecutive successes and failures:
healthy ──1 fail──> suspect ──2 more fails──> unhealthy
^ |
| 3 consecutive passes
+────────────────────+
unhealthy ──3 passes──> recovering ──3 more passes──> healthy- healthy: Normal routing target
- suspect: Still used, but deprioritized; being watched
- unhealthy: Excluded from routing; health-checked on a longer interval
- recovering: Excluded from routing; re-entering after passing health checks
HTTP API
The daemon exposes these endpoints on 127.0.0.1:9665:
Routing
| Method | Endpoint | Description |
|--------|----------|-------------|
| POST | /v1/route | Route request to best provider |
| POST | /v1/route?stream=true | Stream response via SSE (Server-Sent Events) |
| GET | /v1/route/:task | Dry-run: show routing order for task type |
OpenAI-compatible (drop-in replacement for OpenAI SDK)
| Method | Endpoint | Description |
|--------|----------|-------------|
| POST | /v1/chat/completions | Chat completions (streaming supported) |
| POST | /v1/embeddings | Text embeddings |
| POST | /v1/audio/speech | Text-to-speech (returns audio/mpeg) |
| POST | /v1/audio/transcriptions | Speech-to-text (multipart/form-data or JSON) |
| POST | /v1/images/generations | Image generation |
| GET | /v1/models | List available models across all providers |
Provider management
| Method | Endpoint | Description |
|--------|----------|-------------|
| GET | /v1/providers | List all providers with health, latency, and capabilities |
| GET | /v1/providers/:name/health | Run health check on specific provider |
| POST | /v1/providers/discover | Scan subnet for AI providers and add to config |
Cache
| Method | Endpoint | Description |
|--------|----------|-------------|
| GET | /v1/cache/stats | Cache statistics (entries, size, hit rate) |
| DELETE | /v1/cache | Clear all cached entries |
Health
| Method | Endpoint | Description |
|--------|----------|-------------|
| GET | /health | Lightweight health check (always public, no auth) |
| GET | /v1/status | Detailed status (providers, cache, uptime) |
Route request format
{
"task": "completion",
"input": {
"messages": [
{ "role": "system", "content": "You are helpful." },
{ "role": "user", "content": "Hello" }
],
"model": "gpt-4o-mini"
},
"preferences": {
"provider": "openai",
"prefer_local": false,
"max_cost_credits": 10,
"no_cache": false,
"quality": "code"
}
}Route response format
{
"provider": "ollama",
"provider_type": "ollama",
"model": "qwen3.5:4b",
"endpoint": "http://localhost:11434",
"method": "chat",
"result": {
"content": "Hello! How can I help?",
"usage": {
"prompt_tokens": 12,
"completion_tokens": 8
}
},
"routing": {
"candidates_evaluated": 3,
"selected_reason": "healthy_local_preferred",
"latency_ms": 342,
"credits_charged": 0
},
"cached": false
}Using the OpenAI SDK against the daemon
The daemon's /v1/chat/completions is fully OpenAI-compatible. Point any OpenAI SDK client at it:
import OpenAI from 'openai';
const client = new OpenAI({
baseURL: 'http://localhost:9665/v1',
apiKey: 'not-needed', // daemon handles auth internally
});
const completion = await client.chat.completions.create({
model: 'auto', // "auto" = let the router decide
messages: [{ role: 'user', content: 'Hello' }],
});Provider pinning via request headers:
const completion = await client.chat.completions.create(
{ model: 'auto', messages: [...] },
{ headers: { 'X-IIS-Provider': 'openai', 'X-IIS-No-Cache': 'true' } },
);Provider Discovery
The daemon can scan your local subnet for running AI inference servers:
# CLI scan
iis-provider discover
# API scan (auto-adds discovered providers to config)
curl -X POST http://localhost:9665/v1/providers/discover
# Dry-run (preview only, don't modify config)
curl -X POST http://localhost:9665/v1/providers/discover \
-H 'Content-Type: application/json' \
-d '{"dryRun": true}'Scans for: Ollama (11434), vLLM/llama.cpp/LM Studio (8000, 8080, 1234, and others).
Relationship to Nexus
@iistools/provider is the AI routing core. @iistools/nexus imports it and adds knowledge store, dispatch, dashboard, mail, and webhook capabilities.
┌─────────────────────────────────────────────────────┐
│ @iistools/nexus (iis-nexus, port 9666) │
│ Knowledge store, dispatch, dashboard, webhooks │
│ ┌───────────────────────────────────────────────┐ │
│ │ @iistools/provider │ │
│ │ Routing, health, adapters, cache, metering │ │
│ └───────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────┘
@iistools/provider (iis-provider, port 9665) ← standalone modeConsumer projects can use either:
- Provider daemon (
iis-provider daemon, port 9665) — routing + metering, no extra dependencies - Nexus daemon (
iis-nexus, port 9666) — routing + knowledge + dispatch + dashboard
The @iistools/nexus-client SDK auto-discovers whichever is running.
Environment Variables
| Variable | Default | Description |
|----------|---------|-------------|
| IIS_PROVIDER_URL | — | Override daemon URL (e.g. http://127.0.0.1:9665) |
| IIS_PROVIDER_PORT | 9665 | Override daemon port |
| IIS_PROVIDER_LOG_LEVEL | info | Log level: error, warn, info, debug |
| IIS_PROVIDER_TOKEN | — | Auth token (when require_auth: true in config) |
| IIS_SERVICE_KEY | — | IIS account service key (for managed/pooled providers) |
| IIS_NEXUS_URL | — | Fallback URL (compatibility with nexus env vars) |
| IIS_NEXUS_PORT | — | Fallback port |
| IIS_NEXUS_LOG_LEVEL | — | Fallback log level |
Init Onboarding Paths
iis-provider init adapts to what's already available:
| Situation | What happens |
|-----------|-------------|
| Ollama or LM Studio running locally | Auto-discovered and configured — no prompts needed |
| No local inference, no keys provided | Offers Ollama bootstrap (--setup-ollama): installs Ollama, pulls qwen3.5:4b + qwen3-embedding:0.6b |
| Cloud keys provided via flags | Writes providers.yaml + secrets.yaml for each key |
| --no-ai flag | Writes empty config and exits cleanly (for non-AI iistools features) |
In non-interactive environments (no TTY), all setup is flag-driven. Interactive mode offers prompts for each option.
CLI Reference
iis-provider <command> [options]
COMMANDS
daemon, start Start the provider daemon (default port 9665)
mcp Start MCP server (stdio transport — for Claude Code)
status Show daemon health
providers List configured providers with health and capabilities
route <task> Show routing priority order for a task type
usage [--since=DATE] Usage summary (requests, tokens, credits)
discover Scan local subnet for AI providers
init First-run setup
GLOBAL OPTIONS
--help Show help
--version Show version
--describe One-line description (for iis dispatcher)
--output json Machine-readable JSON output on all commands
--config=PATH Override config file path
TASK TYPES (for route command)
completion, embedding, tts, transcription, image_generation
INIT FLAGS
--openai-key=KEY OpenAI API key
--anthropic-key=KEY Anthropic API key
--gemini-key=KEY Google Gemini API key
--elevenlabs-key=KEY ElevenLabs API key
--ollama-url=URL Ollama server URL
--lmstudio-url=URL LM Studio server URL
--account-key=KEY IIS account API key (for managed pools)
--setup-ollama Install Ollama and pull starter models
--no-setup-ollama Skip Ollama bootstrap offer
--no-ai Skip all AI provider setup
--no-discover Skip network scan
--config-dir=DIR Custom config directory
--yes, -y Auto-confirm overwrites
EXIT CODES
0 success
1 general error
2 usage error (bad args, missing required flag)
3 auth error
4 rate limited
5 network error (daemon not running)Testing
pnpm --filter @iistools/provider run test:runTests live in __tests__/. Always use test:run — vitest alone hangs in watch mode.
