@iistools/provider

v0.2.0

Published

3 months ago

AI provider router — smart routing across local and cloud LLM/TTS/embedding providers with account-aware metering

Downloads

0High
0Medium
0Low

iistools-admin

hschmied

@iistools/provider

AI provider router — smart routing across local and cloud LLM/TTS/embedding providers with account-aware metering.

What it does

Routes AI requests to the best available provider based on health, latency, cost, and capability. Supports:

Local inference: Ollama, vLLM, llama.cpp server, LM Studio
Cloud APIs: OpenAI, Anthropic, Google Gemini, ElevenLabs
Free fallbacks: Edge TTS (Microsoft, local binary), Whisper local transcription
IIS managed pools: Shared API keys via IIS account service

Requests fall through the tier hierarchy automatically. If Ollama is running locally, it gets picked first. If it goes down mid-session, the next healthy provider takes over without any intervention.

Quick Start

First-time setup

# Install globally
npm install -g @iistools/provider

# Interactive setup — discovers local providers, offers Ollama bootstrap if none found
iis-provider init

# Bootstrap Ollama (free, local, pulls qwen3.5:4b + qwen3-embedding:0.6b, ~4GB)
iis-provider init --setup-ollama

# Configure with a cloud key (no local inference required)
iis-provider init --openai-key=sk-...

# Multiple keys at once
iis-provider init --openai-key=sk-... --anthropic-key=sk-ant-... --gemini-key=AI...

# Point to an existing Ollama server on the LAN
iis-provider init --ollama-url=http://your-server:11434

# No AI needed (for non-AI iistools features)
iis-provider init --no-ai

Start the daemon

iis-provider daemon
# iis-provider daemon listening on 127.0.0.1:9665

Route a request

# Completion
curl -X POST http://localhost:9665/v1/route \
  -H 'Content-Type: application/json' \
  -d '{
    "task": "completion",
    "input": {
      "messages": [{"role": "user", "content": "What is 2+2?"}]
    }
  }'

# Via OpenAI-compatible endpoint (works with any OpenAI SDK)
curl -X POST http://localhost:9665/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "auto",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

Usage

As a library (Node.js)

import {
  Router,
  ProviderRegistry,
  HealthChecker,
  CacheManager,
  UsageMeter,
  AccountClient,
  createLogger,
  registerBuiltinAdapters,
  getAdapter,
  loadProviderConfig,
  resolveProviderConfig,
} from '@iistools/provider';

// Load and resolve config (env var overrides applied)
const config = resolveProviderConfig(loadProviderConfig());

// Register built-in adapters (openai, anthropic, ollama, etc.)
registerBuiltinAdapters();

// Build the registry and router
const registry = new ProviderRegistry(config);
const log = createLogger(config.log_level ?? 'info', 'my-service');
const router = new Router(registry, getAdapter, log, config.routing ?? {});

// Route a request
const result = await router.route({
  task: 'completion',
  input: {
    messages: [{ role: 'user', content: 'Summarize this document...' }],
  },
});

console.log(result.result.content);      // response text
console.log(result.provider);            // which provider handled it
console.log(result.routing.latency_ms);  // round-trip latency

Streaming completions

for await (const chunk of router.routeStream({
  task: 'completion',
  input: { messages: [{ role: 'user', content: 'Write a poem about...' }] },
})) {
  process.stdout.write(chunk.delta ?? '');
  if (chunk.done) break;
}

Embeddings and TTS

// Embeddings (cached automatically)
const embed = await router.route({
  task: 'embedding',
  input: { input: 'Text to embed' },
});
const vector = embed.result.embedding; // number[]

// Text-to-speech (cached automatically — identical text+voice hits cache)
const tts = await router.route({
  task: 'tts',
  input: { text: 'Hello world', voice: 'alloy' },
});
const audioBase64 = tts.result.audio; // string (base64 MP3)

Provider preferences

// Pin to a specific provider
await router.route({
  task: 'completion',
  input: { messages: [...] },
  preferences: { provider: 'openai' },
});

// Force local-only (fail if no local provider available)
await router.route({
  task: 'completion',
  input: { messages: [...] },
  preferences: { prefer_local: true },
});

// Cost ceiling (skip pooled providers if credits exceed limit)
await router.route({
  task: 'completion',
  input: { messages: [...] },
  preferences: { max_cost_credits: 5 },
});

// Quality hint (prefer providers strong in code tasks)
await router.route({
  task: 'completion',
  input: { messages: [...] },
  preferences: { quality: 'code' },
});

As a daemon (HTTP API)

iis-provider daemon                    # Start on 127.0.0.1:9665
iis-provider daemon --config=~/.config/iis/provider/providers.yaml

iis-provider status                    # Daemon health check
iis-provider providers                 # List providers with health
iis-provider providers --output json   # Machine-readable

iis-provider route completion          # Show routing priority order
iis-provider route embedding           # Which providers handle embeddings?

iis-provider usage                     # All-time usage summary
iis-provider usage --since=2026-01-01  # Filter by date

iis-provider discover                  # Scan LAN for AI providers

As an MCP server (Claude Code)

Add to your Claude Code MCP config (~/.claude/mcp.json or project .claude/settings.json):

{
  "mcpServers": {
    "provider": {
      "command": "iis-provider",
      "args": ["mcp"]
    }
  }
}

This exposes 8 tools directly to Claude:

| Tool | Description | |------|-------------| | provider_complete | Send a completion request through the router | | provider_embed | Generate text embeddings | | provider_transcribe | Transcribe an audio file (wav, mp3, m4a, flac, ogg, webm) | | provider_speak | Text-to-speech — saves audio to a file | | provider_imagine | Image generation — saves image to a file | | provider_route | Dry-run: show routing order for a task type | | provider_providers | List configured providers with health and capabilities | | provider_status | Daemon status (uptime, provider health, cache stats) |

Configuration

Config file path (first found wins):

~/.config/iis/provider/providers.yaml   ← provider-specific
~/.config/iis/nexus/providers.yaml      ← shared with nexus daemon

Full configuration reference:

# Provider definitions
providers:
  # Local Ollama (auto-discovered)
  ollama:
    type: ollama
    endpoint: http://localhost:11434
    capabilities: [completion, embedding]
    priority: 10            # Lower = preferred
    free: true              # No credits charged

  # LAN inference server (OpenAI-compatible endpoint)
  lan-gpu:
    type: openai-compatible
    endpoint: http://your-gpu-server:8080
    capabilities: [completion]
    priority: 15
    free: true

  # OpenAI with your own API key
  openai:
    type: openai
    endpoint: https://api.openai.com
    key: secret:openai      # References secrets.yaml
    capabilities: [completion, embedding, tts, transcription, image_generation]
    priority: 50
    models:
      completion: [gpt-4o, gpt-4o-mini]
    rate_limit:
      requests_per_minute: 60

  # Anthropic Claude
  anthropic:
    type: anthropic
    endpoint: https://api.anthropic.com
    key: secret:anthropic
    capabilities: [completion]
    priority: 55

  # Google Gemini (via OpenAI-compatible endpoint)
  gemini:
    type: openai-compatible
    endpoint: https://generativelanguage.googleapis.com/v1beta/openai
    key: secret:gemini
    capabilities: [completion, embedding]
    priority: 60

  # Free TTS fallback (requires edge-tts binary on PATH)
  edge-tts:
    type: edge-tts
    capabilities: [tts]
    priority: 90
    free: true

  # Local Whisper transcription
  whisper-local:
    type: whisper-local
    capabilities: [transcription]
    priority: 20
    free: true

  # ElevenLabs TTS
  elevenlabs:
    type: elevenlabs
    key: secret:elevenlabs
    capabilities: [tts]
    priority: 70

  # IIS managed/pooled provider
  iis-pool:
    type: openai
    endpoint: https://api.iis.tools/v1
    key: managed            # Uses IIS account credits
    capabilities: [completion, embedding]
    priority: 80

# Routing behavior
routing:
  default_preference: local    # local | cloud | cost
  fallback_retries: 2          # Retry on provider failure
  load_balance: round-robin    # none | round-robin | weighted

# Cache (TTS and embeddings only)
cache:
  enabled: true
  max_size_mb: 1000
  path: ~/.cache/iis/provider   # Optional override
  ttl:
    tts: 30d
    embedding: 7d

# IIS account service (for managed/pooled providers)
account:
  url: https://account.iis.tools   # Default
  service_key: iis_...             # Or set IIS_SERVICE_KEY env var

# Daemon bind/port (standalone mode only)
bind: 127.0.0.1
port: 9665
log_level: info    # error | warn | info | debug
require_auth: false

Secrets file (~/.config/iis/provider/secrets.yaml):

openai: sk-...
anthropic: sk-ant-...
gemini: AI...
elevenlabs: el_...

Secrets can also be referenced via environment variables in providers.yaml:

key: env:OPENAI_API_KEY    # reads process.env.OPENAI_API_KEY
key: secret:openai         # reads from secrets.yaml
key: managed               # uses IIS account credits

Provider Types

| Type | Adapter | Examples | Capabilities | |------|---------|----------|--------------| | openai | OpenAI API | OpenAI, Azure OpenAI | completion, embedding, tts, transcription, image_generation | | anthropic | Anthropic API | Claude 3.x | completion | | ollama | Ollama REST | Local Ollama | completion, embedding | | openai-compatible | OpenAI-compat shim | Gemini, vLLM, LM Studio, llama.cpp | completion, embedding | | vllm | vLLM native | vLLM inference server | completion | | llama-server | llama.cpp server | llama.cpp HTTP server | completion | | edge-tts | Edge TTS binary | Microsoft TTS (free, offline) | tts | | elevenlabs | ElevenLabs API | ElevenLabs | tts | | whisper-local | Whisper binary | Local Whisper | transcription |

Routing Tiers

Providers are ranked within each request. Tier is inferred from configuration:

| Tier | Priority | How it's detected | |------|----------|-------------------| | 1 — Local | Highest | Endpoint is localhost or LAN IP, or free: true | | 2 — Cloud Own Key | Medium | Cloud endpoint + key: secret:* or key: env:* | | 3 — Pooled | Lower | key: managed | | 4 — Free Fallback | Lowest | free: true + cloud-independent (edge-tts, whisper-local) |

Within a tier, providers are ordered by priority value (lower = preferred), then by health state, then latency.

Health State Machine

Each provider runs through a state machine based on consecutive successes and failures:

healthy ──1 fail──> suspect ──2 more fails──> unhealthy
   ^                    |
   |               3 consecutive passes
   +────────────────────+

unhealthy ──3 passes──> recovering ──3 more passes──> healthy

healthy: Normal routing target
suspect: Still used, but deprioritized; being watched
unhealthy: Excluded from routing; health-checked on a longer interval
recovering: Excluded from routing; re-entering after passing health checks

HTTP API

The daemon exposes these endpoints on 127.0.0.1:9665:

Routing

| Method | Endpoint | Description | |--------|----------|-------------| | POST | /v1/route | Route request to best provider | | POST | /v1/route?stream=true | Stream response via SSE (Server-Sent Events) | | GET | /v1/route/:task | Dry-run: show routing order for task type |

OpenAI-compatible (drop-in replacement for OpenAI SDK)

| Method | Endpoint | Description | |--------|----------|-------------| | POST | /v1/chat/completions | Chat completions (streaming supported) | | POST | /v1/embeddings | Text embeddings | | POST | /v1/audio/speech | Text-to-speech (returns audio/mpeg) | | POST | /v1/audio/transcriptions | Speech-to-text (multipart/form-data or JSON) | | POST | /v1/images/generations | Image generation | | GET | /v1/models | List available models across all providers |

Provider management

| Method | Endpoint | Description | |--------|----------|-------------| | GET | /v1/providers | List all providers with health, latency, and capabilities | | GET | /v1/providers/:name/health | Run health check on specific provider | | POST | /v1/providers/discover | Scan subnet for AI providers and add to config |

Cache

| Method | Endpoint | Description | |--------|----------|-------------| | GET | /v1/cache/stats | Cache statistics (entries, size, hit rate) | | DELETE | /v1/cache | Clear all cached entries |

Health

| Method | Endpoint | Description | |--------|----------|-------------| | GET | /health | Lightweight health check (always public, no auth) | | GET | /v1/status | Detailed status (providers, cache, uptime) |

Route request format

{
  "task": "completion",
  "input": {
    "messages": [
      { "role": "system", "content": "You are helpful." },
      { "role": "user", "content": "Hello" }
    ],
    "model": "gpt-4o-mini"
  },
  "preferences": {
    "provider": "openai",
    "prefer_local": false,
    "max_cost_credits": 10,
    "no_cache": false,
    "quality": "code"
  }
}

Route response format

{
  "provider": "ollama",
  "provider_type": "ollama",
  "model": "qwen3.5:4b",
  "endpoint": "http://localhost:11434",
  "method": "chat",
  "result": {
    "content": "Hello! How can I help?",
    "usage": {
      "prompt_tokens": 12,
      "completion_tokens": 8
    }
  },
  "routing": {
    "candidates_evaluated": 3,
    "selected_reason": "healthy_local_preferred",
    "latency_ms": 342,
    "credits_charged": 0
  },
  "cached": false
}

Using the OpenAI SDK against the daemon

The daemon's /v1/chat/completions is fully OpenAI-compatible. Point any OpenAI SDK client at it:

import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'http://localhost:9665/v1',
  apiKey: 'not-needed',   // daemon handles auth internally
});

const completion = await client.chat.completions.create({
  model: 'auto',          // "auto" = let the router decide
  messages: [{ role: 'user', content: 'Hello' }],
});

Provider pinning via request headers:

const completion = await client.chat.completions.create(
  { model: 'auto', messages: [...] },
  { headers: { 'X-IIS-Provider': 'openai', 'X-IIS-No-Cache': 'true' } },
);

Provider Discovery

The daemon can scan your local subnet for running AI inference servers:

# CLI scan
iis-provider discover

# API scan (auto-adds discovered providers to config)
curl -X POST http://localhost:9665/v1/providers/discover

# Dry-run (preview only, don't modify config)
curl -X POST http://localhost:9665/v1/providers/discover \
  -H 'Content-Type: application/json' \
  -d '{"dryRun": true}'

Scans for: Ollama (11434), vLLM/llama.cpp/LM Studio (8000, 8080, 1234, and others).

Relationship to Nexus

@iistools/provider is the AI routing core. @iistools/nexus imports it and adds knowledge store, dispatch, dashboard, mail, and webhook capabilities.

┌─────────────────────────────────────────────────────┐
│  @iistools/nexus  (iis-nexus, port 9666)            │
│  Knowledge store, dispatch, dashboard, webhooks      │
│  ┌───────────────────────────────────────────────┐  │
│  │  @iistools/provider                           │  │
│  │  Routing, health, adapters, cache, metering   │  │
│  └───────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────┘

@iistools/provider  (iis-provider, port 9665)   ← standalone mode

Consumer projects can use either:

Provider daemon (iis-provider daemon, port 9665) — routing + metering, no extra dependencies
Nexus daemon (iis-nexus, port 9666) — routing + knowledge + dispatch + dashboard

The @iistools/nexus-client SDK auto-discovers whichever is running.

Environment Variables

| Variable | Default | Description | |----------|---------|-------------| | IIS_PROVIDER_URL | — | Override daemon URL (e.g. http://127.0.0.1:9665) | | IIS_PROVIDER_PORT | 9665 | Override daemon port | | IIS_PROVIDER_LOG_LEVEL | info | Log level: error, warn, info, debug | | IIS_PROVIDER_TOKEN | — | Auth token (when require_auth: true in config) | | IIS_SERVICE_KEY | — | IIS account service key (for managed/pooled providers) | | IIS_NEXUS_URL | — | Fallback URL (compatibility with nexus env vars) | | IIS_NEXUS_PORT | — | Fallback port | | IIS_NEXUS_LOG_LEVEL | — | Fallback log level |

Init Onboarding Paths

iis-provider init adapts to what's already available:

| Situation | What happens | |-----------|-------------| | Ollama or LM Studio running locally | Auto-discovered and configured — no prompts needed | | No local inference, no keys provided | Offers Ollama bootstrap (--setup-ollama): installs Ollama, pulls qwen3.5:4b + qwen3-embedding:0.6b | | Cloud keys provided via flags | Writes providers.yaml + secrets.yaml for each key | | --no-ai flag | Writes empty config and exits cleanly (for non-AI iistools features) |

In non-interactive environments (no TTY), all setup is flag-driven. Interactive mode offers prompts for each option.

CLI Reference

iis-provider <command> [options]

COMMANDS
  daemon, start         Start the provider daemon (default port 9665)
  mcp                   Start MCP server (stdio transport — for Claude Code)
  status                Show daemon health
  providers             List configured providers with health and capabilities
  route <task>          Show routing priority order for a task type
  usage [--since=DATE]  Usage summary (requests, tokens, credits)
  discover              Scan local subnet for AI providers
  init                  First-run setup

GLOBAL OPTIONS
  --help           Show help
  --version        Show version
  --describe       One-line description (for iis dispatcher)
  --output json    Machine-readable JSON output on all commands
  --config=PATH    Override config file path

TASK TYPES (for route command)
  completion, embedding, tts, transcription, image_generation

INIT FLAGS
  --openai-key=KEY       OpenAI API key
  --anthropic-key=KEY    Anthropic API key
  --gemini-key=KEY       Google Gemini API key
  --elevenlabs-key=KEY   ElevenLabs API key
  --ollama-url=URL       Ollama server URL
  --lmstudio-url=URL     LM Studio server URL
  --account-key=KEY      IIS account API key (for managed pools)
  --setup-ollama         Install Ollama and pull starter models
  --no-setup-ollama      Skip Ollama bootstrap offer
  --no-ai                Skip all AI provider setup
  --no-discover          Skip network scan
  --config-dir=DIR       Custom config directory
  --yes, -y              Auto-confirm overwrites

EXIT CODES
  0  success
  1  general error
  2  usage error (bad args, missing required flag)
  3  auth error
  4  rate limited
  5  network error (daemon not running)

Testing

pnpm --filter @iistools/provider run test:run

Tests live in __tests__/. Always use test:run — vitest alone hangs in watch mode.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@iistools/provider

What it does

Quick Start

First-time setup

Start the daemon

Route a request

Usage

As a library (Node.js)

Streaming completions

Embeddings and TTS

Provider preferences

As a daemon (HTTP API)

As an MCP server (Claude Code)

Configuration

Provider Types

Routing Tiers

Health State Machine

HTTP API

Routing

OpenAI-compatible (drop-in replacement for OpenAI SDK)

Provider management

Cache

Health

Route request format

Route response format

Using the OpenAI SDK against the daemon

Provider Discovery

Relationship to Nexus

Environment Variables

Init Onboarding Paths

CLI Reference

Testing