@halo-sdk/gateway
v2.0.0
Published
Cache-aware AI gateway for Halo SDK — sticky routing + fallback chains that don't throw away a warm prefix cache
Downloads
218
Maintainers
Readme
@halo-sdk/gateway
Cache-aware AI gateway for Halo SDK. A ModelAdapter that fronts a pool of providers with:
- Sticky routing — requests stay on the last provider that succeeded, so that provider's prefix cache stays warm. The gateway only moves off it on failure (not round-robin, which would cold-start a different cache every call).
- Fallback chains — on error it tries the remaining providers in order.
Every routing decision emits a route event onto the S5 observability spine.
Usage
import { GatewayAdapter } from "@halo-sdk/gateway";
import { AnthropicAdapter, OpenAIAdapter } from "@halo-sdk/adapters";
const gateway = new GatewayAdapter({
routes: [
{ name: "anthropic", adapter: new AnthropicAdapter({ apiKey: A }) },
{ name: "openai", adapter: new OpenAIAdapter({ apiKey: O }) }, // fallback
],
onEvent: (e) => console.log(e), // wire to an agent's event bus
});
const agent = halo.agent({ adapter: gateway /* ... */ });It composes with the @halo-sdk/otel decorators — wrap each route's adapter with withRetry / withTelemetry as needed.
Reserved extension point
The middleware option (GatewayMiddleware) is the designated integration seam for a forthcoming maintainer-defined capability. Implementations must emit their activity onto the S5 event stream. This interface will change when that capability lands.
