@halo-sdk/adapters

v3.0.0

Published

3 days ago

Model adapters for Halo AI SDK — OpenAI, Anthropic, Gemini, DeepSeek & OpenAI-compatible, with cache-aware prefix/breakpoint support

Downloads

512

0High
0Medium
0Low

maplecity0512

adapter ai anthropic claude deepseek gemini llm openai prefix-cache prompt-caching

@halo-sdk/adapters

Model adapters for Halo AI SDK — provider-specific implementations of the ModelAdapter interface.

Ships adapters for OpenAI, Anthropic (Claude), Google Gemini, DeepSeek, and any OpenAI-compatible endpoint. Every adapter is cache-aware: it reports prefix-cache usage in a unified hit/miss model, and the Anthropic adapter places explicit cache_control breakpoints automatically.

Installation

npm install @halo-sdk/adapters

Requires @halo-sdk/core as a peer dependency.

Provider matrix

| Adapter | Caching | Vision | Structured output | Default model | | ------------------------- | ---------------------------------------------- | ------------ | ----------------- | ------------------- | | OpenAIAdapter | implicit | ✅ | ✅ | gpt-4o-mini | | AnthropicAdapter | explicit (cache_control, ≤4 breakpoints) | ✅ | — | claude-opus-4-8 | | GeminiAdapter | implicit | ✅ | ✅ | gemini-2.0-flash | | DeepSeekAdapter | implicit | — | — | deepseek-v4-flash | | OpenAICompatibleAdapter | implicit (configurable) | configurable | configurable | — |

All adapters take { apiKey, model?, baseUrl?, pricing? } and are interchangeable:

import { OpenAIAdapter, AnthropicAdapter, GeminiAdapter } from "@halo-sdk/adapters";
import { Halo } from "@halo-sdk/core";

const halo = new Halo({ adapter: new OpenAIAdapter({ apiKey: process.env.OPENAI_API_KEY! }) });
// swap freely — the prefix/history cache contract is identical across providers:
// new AnthropicAdapter({ apiKey: process.env.ANTHROPIC_API_KEY! })
// new GeminiAdapter({ apiKey: process.env.GEMINI_API_KEY! })

Anthropic — explicit cache breakpoints

Anthropic requires callers to declare up to 4 cache_control breakpoints; the provider then caches cumulatively to each one. The adapter places them automatically, most-stable-first: tools → system → in-history anchors (RAG / memory / milestone segments above the min-token threshold), driven by the shared allocateBreakpoints allocator. Anthropic's split counters (cache_read_input_tokens / cache_creation_input_tokens) are mapped onto Halo's unified hit/miss usage model — a cache write counts as a miss that also pays the write premium.

import { AnthropicAdapter } from "@halo-sdk/adapters";

const adapter = new AnthropicAdapter({
  apiKey: process.env.ANTHROPIC_API_KEY!,
  model: "claude-opus-4-8", // optional
  defaultMaxTokens: 4096, // Anthropic requires max_tokens; overridable per call
});

OpenAI-compatible (gateways, self-hosted, vLLM, Ollama, …)

OpenAICompatibleAdapter targets any OpenAI Chat Completions endpoint. Configure the base URL, model, capabilities, pricing, and extra headers:

import { OpenAICompatibleAdapter } from "@halo-sdk/adapters";

const adapter = new OpenAICompatibleAdapter({
  apiKey: process.env.GATEWAY_KEY!,
  model: "llama-3.3-70b",
  baseUrl: "https://my-gateway.local/v1",
  capabilities: { vision: true, structuredOutput: true },
  headers: { "X-Tenant": "acme" },
});

DeepSeek Adapter

import { DeepSeekAdapter } from "@halo-sdk/adapters";
import { Halo } from "@halo-sdk/core";

const halo = new Halo({
  adapter: new DeepSeekAdapter({
    apiKey: process.env.DEEPSEEK_API_KEY!,
    model: "deepseek-v4-pro", // optional, defaults to deepseek-v4-flash
    baseUrl: "https://api.deepseek.com", // optional, for proxies or self-hosted
  }),
});

Prefix Caching

DeepSeek automatically caches the stable prefix (system prompt + tools + few-shots). The adapter tracks cache hits and reports them via agent.stats.caching:

console.log(agent.stats.caching);
// → { totalCacheHitTokens, totalCacheMissTokens, cacheHitRate, estimatedSavingsUsd }

Keep-Alive

Server-side KV cache expires after ~5 minutes. Use agent.keepAlive() to maintain it:

const { stop } = agent.keepAlive(120_000); // ping every 2 min
// ... long task ...
stop();

Custom Adapter

Implement ModelAdapter to support any LLM provider:

import type { ModelAdapter, ModelCapabilities, ChatParams } from "@halo-sdk/core";

class MyAdapter implements ModelAdapter {
  readonly modelId = "my-model";
  readonly contextWindow = 128_000;
  readonly capabilities: ModelCapabilities = {
    toolUse: true,
    streaming: true,
    caching: { mode: "implicit" }, // required: "none" | "implicit" | "explicit"
  };

  async chat(params: ChatParams) {
    const messages = [...params.prefix, ...params.history];
    // Call your provider's API...
    return { content, toolCalls, usage };
  }

  async *stream(params: ChatParams): AsyncGenerator<TurnChunk> {
    // Yield { type: "text-delta", delta } | { type: "tool-call-ready", ... } | { type: "done", usage }
  }
}

Documentation

See the Halo SDK docs for full API reference.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@halo-sdk/adapters

Installation

Provider matrix

Anthropic — explicit cache breakpoints

OpenAI-compatible (gateways, self-hosted, vLLM, Ollama, …)

DeepSeek Adapter

Prefix Caching

Keep-Alive

Custom Adapter

Documentation