@halo-sdk/adapters
v3.0.0
Published
Model adapters for Halo AI SDK — OpenAI, Anthropic, Gemini, DeepSeek & OpenAI-compatible, with cache-aware prefix/breakpoint support
Downloads
512
Maintainers
Readme
@halo-sdk/adapters
Model adapters for Halo AI SDK — provider-specific implementations of the ModelAdapter interface.
Ships adapters for OpenAI, Anthropic (Claude), Google Gemini, DeepSeek, and any OpenAI-compatible endpoint. Every adapter is cache-aware: it reports prefix-cache usage in a unified hit/miss model, and the Anthropic adapter places explicit cache_control breakpoints automatically.
Installation
npm install @halo-sdk/adaptersRequires @halo-sdk/core as a peer dependency.
Provider matrix
| Adapter | Caching | Vision | Structured output | Default model |
| ------------------------- | ---------------------------------------------- | ------------ | ----------------- | ------------------- |
| OpenAIAdapter | implicit | ✅ | ✅ | gpt-4o-mini |
| AnthropicAdapter | explicit (cache_control, ≤4 breakpoints) | ✅ | — | claude-opus-4-8 |
| GeminiAdapter | implicit | ✅ | ✅ | gemini-2.0-flash |
| DeepSeekAdapter | implicit | — | — | deepseek-v4-flash |
| OpenAICompatibleAdapter | implicit (configurable) | configurable | configurable | — |
All adapters take { apiKey, model?, baseUrl?, pricing? } and are interchangeable:
import { OpenAIAdapter, AnthropicAdapter, GeminiAdapter } from "@halo-sdk/adapters";
import { Halo } from "@halo-sdk/core";
const halo = new Halo({ adapter: new OpenAIAdapter({ apiKey: process.env.OPENAI_API_KEY! }) });
// swap freely — the prefix/history cache contract is identical across providers:
// new AnthropicAdapter({ apiKey: process.env.ANTHROPIC_API_KEY! })
// new GeminiAdapter({ apiKey: process.env.GEMINI_API_KEY! })Anthropic — explicit cache breakpoints
Anthropic requires callers to declare up to 4 cache_control breakpoints; the provider then caches cumulatively to each one. The adapter places them automatically, most-stable-first: tools → system → in-history anchors (RAG / memory / milestone segments above the min-token threshold), driven by the shared allocateBreakpoints allocator. Anthropic's split counters (cache_read_input_tokens / cache_creation_input_tokens) are mapped onto Halo's unified hit/miss usage model — a cache write counts as a miss that also pays the write premium.
import { AnthropicAdapter } from "@halo-sdk/adapters";
const adapter = new AnthropicAdapter({
apiKey: process.env.ANTHROPIC_API_KEY!,
model: "claude-opus-4-8", // optional
defaultMaxTokens: 4096, // Anthropic requires max_tokens; overridable per call
});OpenAI-compatible (gateways, self-hosted, vLLM, Ollama, …)
OpenAICompatibleAdapter targets any OpenAI Chat Completions endpoint. Configure the base URL, model, capabilities, pricing, and extra headers:
import { OpenAICompatibleAdapter } from "@halo-sdk/adapters";
const adapter = new OpenAICompatibleAdapter({
apiKey: process.env.GATEWAY_KEY!,
model: "llama-3.3-70b",
baseUrl: "https://my-gateway.local/v1",
capabilities: { vision: true, structuredOutput: true },
headers: { "X-Tenant": "acme" },
});DeepSeek Adapter
import { DeepSeekAdapter } from "@halo-sdk/adapters";
import { Halo } from "@halo-sdk/core";
const halo = new Halo({
adapter: new DeepSeekAdapter({
apiKey: process.env.DEEPSEEK_API_KEY!,
model: "deepseek-v4-pro", // optional, defaults to deepseek-v4-flash
baseUrl: "https://api.deepseek.com", // optional, for proxies or self-hosted
}),
});Prefix Caching
DeepSeek automatically caches the stable prefix (system prompt + tools + few-shots). The adapter tracks cache hits and reports them via agent.stats.caching:
console.log(agent.stats.caching);
// → { totalCacheHitTokens, totalCacheMissTokens, cacheHitRate, estimatedSavingsUsd }Keep-Alive
Server-side KV cache expires after ~5 minutes. Use agent.keepAlive() to maintain it:
const { stop } = agent.keepAlive(120_000); // ping every 2 min
// ... long task ...
stop();Custom Adapter
Implement ModelAdapter to support any LLM provider:
import type { ModelAdapter, ModelCapabilities, ChatParams } from "@halo-sdk/core";
class MyAdapter implements ModelAdapter {
readonly modelId = "my-model";
readonly contextWindow = 128_000;
readonly capabilities: ModelCapabilities = {
toolUse: true,
streaming: true,
caching: { mode: "implicit" }, // required: "none" | "implicit" | "explicit"
};
async chat(params: ChatParams) {
const messages = [...params.prefix, ...params.history];
// Call your provider's API...
return { content, toolCalls, usage };
}
async *stream(params: ChatParams): AsyncGenerator<TurnChunk> {
// Yield { type: "text-delta", delta } | { type: "tool-call-ready", ... } | { type: "done", usage }
}
}Documentation
See the Halo SDK docs for full API reference.
