inference-relay

v1.6.4

Published

2 months ago

Route AI inference to the user's existing subscription

0High
0Medium
0Low

jdblecher

ai inference claude openai anthropic relay routing streaming fallback cost-optimization

inference-relay

Route AI inference through your users' existing subscriptions. 98.9% average cost reduction.

Patent Pending. (c) 2026 L2B II LLC. Proprietary — see LICENSE.

Quickstart

npm install inference-relay
export IR_LICENSE_KEY=ir_live_xxxx

import 'inference-relay/auto';
import Anthropic from '@anthropic-ai/sdk';

// Your existing code works unchanged. Inference routes through user's subscription.
const msg = await client.messages.create({
  model: 'claude-sonnet-4-20250514',
  max_tokens: 1024,
  messages: [{ role: 'user', content: 'Analyze this contract...' }],
});

The library automatically bridges existing SDK instances to the native gateway.

How It Works

The library performs an asymmetric handshake with the Protocol Authority to synchronize execution parameters. It then bridges your SDK instances to the Native Subscription Gateway, routing heavy inference through the user's own authorized execution environment. Your application pays only for lightweight orchestration calls.

Cost Savings

| Workflow | Traditional | With Relay | Savings | |----------|------------|------------|---------| | Document Analysis (Sonnet) | $0.12/doc | $0.0003/doc | 99.7% | | Code Review (Sonnet) | $0.08/review | $0.0002/review | 99.7% | | Chat Agent (Haiku orchestration + Sonnet execution) | $0.15/session | $0.001/session | 99.3% |

Resource Utilization Efficiency averages 98.9% across production workloads.

Provider Support

| Provider | Type | Cost to Developer | Platform | |----------|------|-------------------|----------| | Native Subscription Gateway (Claude CLI) | Subscription relay | $0.00 | Desktop | | Anthropic API | Direct API | Standard pricing | Any | | OpenAI API | Direct API | Standard pricing | Any | | Ollama | Local inference | $0.00 (local) | Any |

Privacy Guarantee

inference-relay is a dumb pipe. The promptContent field is typed as false -- a compile-time literal, not a runtime flag. Prompt and response content never touches relay infrastructure. Only metadata (model, token count, routing decision) transits the coordination layer.

See Security Documentation for the full audit trail architecture.

Integration Levels

Auto-patch -- import 'inference-relay/auto' -- Zero code changes. Patches SDK prototypes at import.
Explicit -- import { relay } from 'inference-relay' -- Wrap specific clients for granular control.
Environment Variable -- Set IR_PROVIDER=claude-cli -- Route without any code changes.

Architecture (v1.3+)

v1.4.0 adds a built-in MODEL_REGISTRY (17 models across Anthropic, OpenAI, and Ollama), smart provider pre-skip, and cross-family Atomic Session Continuity. See Model Registry for details.

Starting in v1.3, the Native Subscription Gateway implementation is distributed separately from the npm package. On first use of a subscription-routed call, the library downloads a signed implementation bundle from the Protocol Authority under your license key, verifies its RS256 signature against an embedded public key, and caches it locally at ~/.inference-relay/gateway/<version>/bundle.js. Subsequent calls in the same process use the cached, in-memory module — zero extra latency.

The npm package contains the public SDK surface, the routing engine, the fallback cascade, the API providers (Anthropic, OpenAI, Ollama), and the gateway loader. It does not contain the gateway implementation itself.

Cold start: ~1-3s for the first call (download + verify + load). Warm start: ~50-200ms (cache hit + verify + load). Subsequent calls: zero overhead.

Airgapped environments: run inference-relay gateway pre-fetch from a connected machine to populate ~/.inference-relay/gateway/, then transport the directory.

License termination: When a license is revoked, the next gateway probe fails closed. The relay returns a structured LICENSE_REJECTED error and falls back to API providers if configured.

Pricing

| Plan | Price | Includes | |------|-------|----------| | Solo | $50/mo | 3,000 calls/mo, all providers, email support | | Pro | $100/mo | 15,000 calls/mo, warm process pool, advanced routing DSL, audit trail, priority support | | Enterprise | Custom | Multi-developer provisioning, fleet policy, dedicated support, on-prem relay option |

A valid license key is required to use the relay. View pricing.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

inference-relay

Quickstart

How It Works

Cost Savings

Provider Support

Privacy Guarantee

Integration Levels

Architecture (v1.3+)

Pricing

Links