@howells/ai

v0.1.18

Published

3 days ago

Unified AI client for all projects — Gateway, OpenRouter, direct providers, tiered language models, and retrieval models.

0High
0Medium
0Low

danielhowells

@howells/ai

Unified AI client for all projects. One package, Vercel AI Gateway by default, direct provider escape hatches, provider-aware model tiers, and normalized generation settings.

Quick Start

import { createAI, visionPrompt } from "@howells/ai";
import { generateText, Output, streamText, embed } from "ai";

const ai = createAI({
  app: { name: "MyApp", url: "https://myapp.com" },
});

// Pick a model by tier
const { text } = await generateText({
  model: ai.model("fast"),
  prompt: "Classify this ingredient",
});

// Add capabilities per tier
const { text: analysis } = await generateText({
  model: ai.model("powerful", {
    agent: "taste-analysis",
    tools: true,
    vision: true,
  }),
  prompt: visionPrompt("Analyze this design", [
    "https://example.com/design.png",
  ]),
});

// Structured output
const { output } = await generateText({
  model: ai.model("standard", { agent: "search" }),
  output: Output.object({ schema: myZodSchema }),
  prompt: "Extract entities from this text",
});

Vision Input

Use visionPrompt() to build AI SDK-native text + image prompts from URLs, data URLs, or binary image data:

const { text } = await generateText({
  model: ai.model("standard", { vision: true }),
  prompt: visionPrompt("What changed in this screenshot?", [
    "https://example.com/screenshot.png",
    { data: screenshotBytes, mediaType: "image/png" },
  ]),
});

imagePart() is also exported when you need to compose the AI SDK message parts yourself, and visionMessage() wraps the same content as a user message for APIs that expect messages.

Generation Options

Use ai.generationOptions(...) for the settings that vary across providers: reasoning budget, verbosity, structured-output provider behavior, tool policy, response length, sampling, prompt cache, user attribution, and service tier.

const provider = "openai";

const { text } = await generateText({
  model: ai.model("powerful", { provider, tools: true }),
  prompt: "Plan the migration",
  tools: migrationTools,
  ...ai.generationOptions({
    provider,
    reasoning: "high",
    verbosity: "medium",
    structured: "strict",
    tools: "auto",
    maxToolSteps: 5,
    outputLength: "long",
    creativity: "focused",
    user: "migration-agent",
  }),
});

For Gateway calls, pass the canonical model ID when you want provider-specific options inferred as well as Gateway attribution:

const modelId = "openai/gpt-5.4";

await streamText({
  model: ai.modelById(modelId),
  prompt: "...",
  ...ai.generationOptions({
    provider: "gateway",
    modelId,
    reasoning: "medium",
    verbosity: "high",
  }),
});

| Normalized Option | AI SDK / Provider Mapping | |-------------------|---------------------------| | reasoning | OpenAI reasoningEffort, Anthropic thinking, Google thinkingConfig, OpenRouter reasoning. Accepts a preset ("high") or { effort, maxTokens }. | | verbosity | OpenAI textVerbosity | | structured | OpenAI strict JSON schema, Anthropic structured output mode, Google structured outputs | | tools | AI SDK toolChoice | | maxToolSteps | AI SDK stopWhen: stepCountIs(n) | | parallelTools | OpenAI/OpenRouter parallel tool calls, Anthropic inverse disable flag | | outputLength | AI SDK maxOutputTokens preset | | creativity | AI SDK temperature preset | | cache | Anthropic cacheControl, OpenRouter cache_control. Pass "ephemeral" or { ttl: "5m" \| "1h" }. | | serviceTier | OpenAI/Google service tier where supported | | routing | Normalized route intent. Gateway sort/only/order/zeroDataRetention/..., OpenRouter provider.{sort, only, ignore, order, allow_fallbacks, max_price, quantizations, zdr, data_collection} | | fallbackModels | Gateway models, OpenRouter models (model fallback chain) | | openRouterVariant | OpenRouter model suffixes :nitro, :exacto, :floor on ai.model() / ai.modelById() | | tags | Gateway tags (spend reporting). Ignored elsewhere. | | webSearch | OpenRouter plugins: [{ id: "web", ... }]. For Gateway, wire gateway.tools.parallelSearch() / perplexitySearch() via AI SDK tools. | | responseHealing | OpenRouter plugins: [{ id: "response-healing" }] (auto-repair JSON for generateObject). | | includeCost | OpenRouter usage: { include: true }. Gateway returns cost automatically. | | logprobs / logitBias | OpenRouter only (logprobs + top_logprobs, logit_bias). |

Routing & cost

// Cheapest provider, ZDR-only, with a price ceiling and fallback model
await generateText({
  model: ai.modelById("anthropic/claude-sonnet-4.6", { provider: "gateway" }),
  prompt: "...",
  ...ai.generationOptions({
    provider: "gateway",
    modelId: "anthropic/claude-sonnet-4.6",
    routing: {
      prefer: "cheapest",
      privacy: ["no-retention", "no-training"],
      allow: ["anthropic", "amazon-bedrock"],
    },
    fallbackModels: ["anthropic/claude-haiku-4.5"],
    tags: ["feature:checkout"],
  }),
});

routing.prefer accepts "auto", "cheapest", "fastest", "highest-throughput", or "highest-quality". For OpenRouter, highest-throughput maps to provider.sort: "throughput"; highest-quality maps to OpenRouter's Exacto-style quality/tool-calling routing, and cheapest maps to price-sorted routing. You can also use openRouterVariant: "nitro", "exacto", or "floor" to send the official OpenRouter model suffixes explicitly. routing.privacy accepts any combination of "no-retention", "no-training", "hipaa". routing.maxCost (OpenRouter only) takes USD-per-million-token ceilings: { promptPerMillion, completionPerMillion, requestUsd }.

Gateway introspection

When the Gateway provider is configured, ai.gateway exposes the control-plane APIs:

const ai = createAI();
if (ai.gateway) {
  const { balance } = await ai.gateway.credits();
  const { models } = await ai.gateway.listModels();
  const spend = await ai.gateway.spend({
    startDate: "2026-04-01",
    endDate: "2026-04-30",
    groupBy: "model",
  });
  const info = await ai.gateway.generationInfo("gen_01H...");
}

Testing

Normal tests are deterministic and do not call providers:

pnpm test
pnpm check-types
pnpm build

Live tests are opt-in because they use real API keys and spend provider quota. They load keys from .env, .env.local, or apps/benchmark/.env.local, then verify every configured provider/model route plus the normalized config option matrix:

pnpm test:live

CLI

The package ships a small CLI as both ai and howells-ai:

ai models
ai providers
ai doctor
ai doctor --live
ai test --provider openai
ai models --task coding
ai bench --provider gateway --task coding --tier fast --prompt "Reply in one sentence."

Use --json on models, providers, doctor, test, and bench for scriptable output. The CLI loads local keys from .env, .env.local, and apps/benchmark/.env.local, and never prints secret values.

Model Matrix

Language Models (via Vercel AI Gateway by default)

Language models are selected by tier, then capability flags. Structured input/output is a baseline requirement for every default language model.

| Tier | Text Default | Tools Default | Vision / Vision Tools Default | Use When | |------|--------------|---------------|-------------------------------|----------| | nano | openai/gpt-5.4-nano | openai/gpt-5.4-nano | google/gemini-3.1-flash-lite-preview | Premium low-cost text plus lightweight Gemini vision | | fast | google/gemini-3.1-flash-lite-preview | google/gemini-3.1-flash-lite-preview | google/gemini-3.1-flash-lite-preview | Fast premium Gemini calls across text, tools, and vision | | standard | google/gemini-3.5-flash | google/gemini-3.5-flash | google/gemini-3.5-flash | Everyday tasks, chat, coding, vision, 1M context | | powerful | google/gemini-3.1-pro-preview | google/gemini-3.1-pro-preview | google/gemini-3.1-pro-preview | High-quality premium Gemini reasoning and multimodal work | | reasoning | anthropic/claude-opus-4.7 | anthropic/claude-opus-4.7 | anthropic/claude-opus-4.7 | Frontier quality and deep multi-step reasoning |

ai.model("fast"); // fast text
ai.model("fast", { tools: true }); // fast tool calling
ai.model("fast", { vision: true }); // fast image understanding
ai.model("fast", { tools: true, vision: true }); // fast image + tools
ai.model("standard", { free: true }); // OpenRouter free-model router

Workload Tasks

Pass task when the best model depends on the job more than the generic tier. general preserves the base matrix; other tasks layer RouterBase-informed picks over the same tier/capability shape.

ai.model("fast", { task: "coding", tools: true }); // GLM 4.7
ai.model("standard", { task: "coding" }); // GPT-5.3 Codex
ai.model("fast", { task: "agentic", tools: true }); // GLM 4.7
ai.model("standard", { task: "vision", vision: true }); // Gemini 3.5 Flash
ai.model("standard", { task: "longContext" }); // Gemini 3.5 Flash

Available tasks: general, coding, agentic, chat, bulk, vision, reasoning, longContext, and creative.

When you pin a provider, task selection stays inside that provider wherever the provider has coverage. For example, provider: "openai", task: "coding" routes to OpenAI's Codex line, while provider: "zai", task: "vision" routes to GLM's vision model instead of falling back to the global winner from another provider. If a requested capability is incompatible with the resolved model, selection throws before any provider call. For example, provider: "deepseek", vision: true fails locally because DeepSeek's selected models are not vision-capable.

Retrieval Models

| Slot | Voyage Default | Gemini Default | Use When | |------|----------------|----------------|----------| | embed | voyage-3 | gemini-embedding-2-preview | Text embeddings | | multimodalEmbed | voyage-multimodal-3.5 | gemini-embedding-2-preview | Text + image embeddings | | rerank | rerank-2.5 | n/a | Search result reranking |

Overriding Models

Override any tier variant or retrieval model per project:

import {
  ANTHROPIC_MODELS,
  createAI,
  GOOGLE_EMBED_MODELS,
  VOYAGE_MODELS,
} from "@howells/ai";

const ai = createAI({
  app: { name: "Sorrel", url: "https://sorrel.app" },
  models: {
    standard: {
      text: ANTHROPIC_MODELS.CLAUDE_SONNET_4_6,
      tools: ANTHROPIC_MODELS.CLAUDE_SONNET_4_6,
    },
    tasks: {
      coding: {
        standard: {
          text: ANTHROPIC_MODELS.CLAUDE_SONNET_4_6,
        },
      },
    },
    embed: { voyage: VOYAGE_MODELS.VOYAGE_3_LITE },
    rerank: VOYAGE_MODELS.RERANK_2_5_LITE,
  },
});

Embedding slots are provider-aware. Configure embed and multimodalEmbed once, then select the provider at the call site:

const ai = createAI({
  models: {
    embed: {
      voyage: VOYAGE_MODELS.VOYAGE_3,
      gemini: GOOGLE_EMBED_MODELS.GEMINI_EMBEDDING_2,
    },
    multimodalEmbed: {
      voyage: VOYAGE_MODELS.MULTIMODAL_3_5,
      gemini: GOOGLE_EMBED_MODELS.GEMINI_EMBEDDING_2,
    },
  },
});

Embeddings

import { embed, embedMany } from "ai";

// Provider-neutral text embeddings
const { embedding } = await embed({
  model: ai.embeddingModel({ input: "text", provider: "voyage" }),
  value: "some text",
});

// Provider-neutral image or image+text embeddings.
// Switch to { provider: "gemini" } without changing the call site shape.
const imageModel = ai.embeddingModel({ input: "image", provider: "voyage" });

// Google Gemini text embeddings (for benchmarking)
const { embedding: g } = await embed({
  model: ai.embeddingModel({ input: "text", provider: "gemini" }),
  value: "some text",
});

// Google Gemini image+text embeddings
const { embedding: imageEmbedding } = await embed({
  model: ai.embeddingModel({ input: "image", provider: "gemini" }),
  value: "green woven upholstery",
  providerOptions: {
    google: {
      content: [
        [{ inlineData: { mimeType: "image/png", data: "<base64>" } }],
      ],
    },
  },
});

// Batch
const { embeddings } = await embedMany({
  model: ai.embeddingModel({ provider: "voyage" }),
  values: ["text one", "text two", "text three"],
});

Reranking

const reranker = ai.rerankModel();

Non-AI-SDK Runtimes

Some frameworks accept config objects instead of AI SDK models:

const model = ai.modelConfig("deepseek/deepseek-v3.2", {
  provider: "openrouter",
  agent: "materials-agent",
});
// { provider, id, service, capabilities, apiKey, serviceApiKey, baseURL, headers, user }

The capabilities field describes which config fields the selected provider can consume, so callers can pass through the useful fields without branching on one provider-specific helper.

| Provider | API Key | Base URL | Headers | App Attribution | Agent Attribution | |----------|---------|----------|---------|-----------------|-------------------| | gateway | yes | no | no | no | no | | openrouter | yes | yes | yes | yes | yes | | anthropic | yes | no | no | no | no | | openai | yes | no | no | no | no | | google | yes | no | no | no | no | | deepseek | yes | yes | no | no | no | | xai | yes | yes | no | no | no | | qwen | yes | yes | no | no | no | | zai | yes | yes | no | no | no | | moonshotai | yes | yes | no | no | no |

For OpenRouter direct HTTP clients, request an OpenRouter model config and pass user in the request body:

const config = ai.modelConfig("deepseek/deepseek-v3.2", {
  provider: "openrouter",
  agent: "nl-search",
});
await fetch(`${config.baseURL}/chat/completions`, {
  method: "POST",
  headers: {
    Authorization: `Bearer ${config.apiKey}`,
    ...config.headers,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    model: "deepseek/deepseek-v3.2",
    messages,
    user: config.user,
  }),
});

Escape Hatch

For models that don't fit any tier:

const { text } = await generateText({
  model: ai.modelById("openai/gpt-5-nano"),
  prompt: "...",
});

Route through OpenRouter or direct providers when needed:

ai.model("standard", { provider: "openrouter" });
ai.model("standard", { provider: "openrouter", openRouterVariant: "exacto" });
ai.modelById("openai/gpt-5.4", { provider: "openrouter", openRouterVariant: "nitro" });
ai.model("standard", { free: true }); // always provider: "openrouter"
ai.modelById("claude-sonnet-4-6", { provider: "anthropic" });
ai.modelById("x-ai/grok-4.3", { provider: "xai" });
ai.modelById("moonshotai/kimi-k2.6", { provider: "moonshotai" });

Constants use normalized package IDs. createAI() translates known provider mismatches at runtime, such as Anthropic's direct 4-6 IDs, direct Google Gemini IDs without the google/ prefix, legacy Gemini 3 Flash aliases, and Alibaba-hosted Qwen IDs. DeepSeek, xAI, Qwen, Z.ai, and Moonshot/Kimi are direct OpenAI-compatible routes when their keys are configured. Other catalog services such as MiniMax, StepFun, Xiaomi, Inception, and Nex AGI route through Gateway or OpenRouter. Free selections use OpenRouter's openrouter/free router so the backing model can rotate with OpenRouter's current free inventory and requested capabilities.

Agent Attribution

Tag OpenRouter requests for per-agent cost tracking:

ai.model("fast", { agent: "search", provider: "openrouter" });
// Sends user tag when provider is "openrouter"

Model Constants

import {
  ANTHROPIC_MODELS,
  DEEPSEEK_MODELS,
  GLM_MODELS,
  GOOGLE_EMBED_MODELS,
  GOOGLE_MODELS,
  INCEPTION_MODELS,
  KIMI_MODELS,
  MINIMAX_MODELS,
  NEX_AGI_MODELS,
  OPENAI_MODELS,
  OPENROUTER_MODELS,
  PROVIDER_TASK_DEFAULT_MODELS,
  QWEN_MODELS,
  STEPFUN_MODELS,
  VOYAGE_MODELS,
  XAI_MODELS,
  XIAOMI_MODELS,
} from "@howells/ai";

// Anthropic
ANTHROPIC_MODELS.CLAUDE_OPUS_4_7        // "anthropic/claude-opus-4.7"
ANTHROPIC_MODELS.CLAUDE_OPUS_4_6        // "anthropic/claude-opus-4.6"
ANTHROPIC_MODELS.CLAUDE_SONNET_4_6      // "anthropic/claude-sonnet-4.6"

// DeepSeek
DEEPSEEK_MODELS.DEEPSEEK_V3_2           // "deepseek/deepseek-v3.2"
DEEPSEEK_MODELS.DEEPSEEK_V4_FLASH       // "deepseek/deepseek-v4-flash"

// GLM / Z.ai
GLM_MODELS.GLM_5                        // "z-ai/glm-5"
GLM_MODELS.GLM_5V_TURBO                 // "z-ai/glm-5v-turbo"
GLM_MODELS.GLM_4_7                      // "z-ai/glm-4.7"
GLM_MODELS.GLM_4_7_FLASH                // "z-ai/glm-4.7-flash"
GLM_MODELS.GLM_4_6V                     // "z-ai/glm-4.6v"

// Kimi / Moonshot
KIMI_MODELS.KIMI_K2_6                   // "moonshotai/kimi-k2.6"
KIMI_MODELS.KIMI_K2_5                   // "moonshotai/kimi-k2.5"
KIMI_MODELS.KIMI_K2_THINKING            // "moonshotai/kimi-k2-thinking"

// Google language models
GOOGLE_MODELS.GEMINI_3_5_FLASH          // "google/gemini-3.5-flash"
GOOGLE_MODELS.GEMINI_3_1_PRO_PREVIEW    // "google/gemini-3.1-pro-preview"
GOOGLE_MODELS.GEMINI_3_1_FLASH_LITE_PREVIEW

// OpenAI
OPENAI_MODELS.GPT_5_4_NANO              // "openai/gpt-5.4-nano"
OPENAI_MODELS.GPT_5_4_MINI              // "openai/gpt-5.4-mini"
OPENAI_MODELS.GPT_5_4                   // "openai/gpt-5.4"
OPENAI_MODELS.GPT_5_3_CODEX             // "openai/gpt-5.3-codex"

// OpenRouter-managed
OPENROUTER_MODELS.FREE                  // "openrouter/free"

// Qwen
QWEN_MODELS.QWEN_3_235B_A22B_2507       // "qwen/qwen3-235b-a22b-2507"
QWEN_MODELS.QWEN_3_NEXT_80B_A3B_INSTRUCT_FREE
QWEN_MODELS.QWEN_3_6_PLUS               // "qwen/qwen3.6-plus"

// xAI
XAI_MODELS.GROK_4_3                     // "x-ai/grok-4.3"

// Gateway/OpenRouter-only services
MINIMAX_MODELS.MINIMAX_M2_7             // "minimax/minimax-m2.7"
MINIMAX_MODELS.MINIMAX_M2_5             // "minimax/minimax-m2.5"
STEPFUN_MODELS.STEP_3_5_FLASH           // "stepfun/step-3.5-flash"
XIAOMI_MODELS.MIMO_V2_FLASH             // "xiaomi/mimo-v2-flash"
INCEPTION_MODELS.MERCURY_2              // "inception/mercury-2"
NEX_AGI_MODELS.DEEPSEEK_V3_1_NEX_N1     // "nex-agi/deepseek-v3.1-nex-n1"

// Provider-pinned task matrix
PROVIDER_TASK_DEFAULT_MODELS.openai?.coding?.standard?.text
// "openai/gpt-5.3-codex"

ai.modelCapabilities({ modelId: "deepseek/deepseek-v3.2" })
// { structured: true, tools: true, vision: false }

// Voyage
VOYAGE_MODELS.VOYAGE_3            // "voyage-3"
VOYAGE_MODELS.VOYAGE_3_LITE       // "voyage-3-lite"
VOYAGE_MODELS.VOYAGE_3_5          // "voyage-3.5"
VOYAGE_MODELS.VOYAGE_3_5_LITE     // "voyage-3.5-lite"
VOYAGE_MODELS.MULTIMODAL_3        // "voyage-multimodal-3"
VOYAGE_MODELS.MULTIMODAL_3_5      // "voyage-multimodal-3.5"
VOYAGE_MODELS.RERANK_2_5          // "rerank-2.5"
VOYAGE_MODELS.RERANK_2_5_LITE     // "rerank-2.5-lite"

// Google
GOOGLE_EMBED_MODELS.GEMINI_EMBEDDING_2  // "gemini-embedding-2-preview"
GOOGLE_EMBED_MODELS.GEMINI_EMBEDDING_1  // "gemini-embedding-001"

Environment Variables

| Variable | Required | Used By | |----------|----------|---------| | AI_GATEWAY_API_KEY | Yes locally for default language models | Vercel AI Gateway | | OPENROUTER_API_KEY | Only if using provider: "openrouter" | OpenRouter provider | | ANTHROPIC_API_KEY | Only if using provider: "anthropic" | Anthropic provider | | OPENAI_API_KEY | Only if using provider: "openai" | OpenAI provider | | VOYAGE_API_KEY | Yes (for embed/rerank) | Voyage provider | | GOOGLE_GEMINI_API_KEY | Only if using Gemini embeddings or provider: "google" | Google provider | | DEEPSEEK_API_KEY | Only if using provider: "deepseek" | DeepSeek direct provider | | XAI_API_KEY | Only if using provider: "xai" | xAI direct provider | | QWEN_API_KEY | Only if using provider: "qwen" | Qwen direct provider | | ZAI_API_KEY | Only if using provider: "zai" | Z.ai / GLM direct provider | | MOONSHOT_API_KEY | Only if using provider: "moonshotai" | Moonshot / Kimi direct provider |

Keys can also be passed directly to createAI():

const ai = createAI({
  gatewayKey: "vck_...",
  openRouterKey: "sk-or-...",
  voyageKey: "pa-...",
  googleKey: "...",
  xaiKey: "...",
  moonshotKey: "...",
  serviceKeys: {
    zai: "...",
    qwen: "...",
  },
});

Service keys are exposed through ai.availableServices and ai.modelConfig() for runtimes that can use provider-specific credentials. The same keys also enable direct OpenAI-compatible AI SDK routes for DeepSeek, xAI, Qwen, Z.ai, and Moonshot/Kimi.

Architecture

Each createAI() returns an independent client (no shared module state)
Providers are lazy-initialized on first use
Safe for tests and multi-config scenarios
Language models route through Vercel AI Gateway by default
OpenRouter and direct provider routes are available per call
Embeddings/reranking through Voyage AI or Google

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@howells/ai

Quick Start

Vision Input

Generation Options

Routing & cost

Gateway introspection

Testing

CLI

Model Matrix

Language Models (via Vercel AI Gateway by default)

Workload Tasks

Retrieval Models

Overriding Models

Embeddings

Reranking

Non-AI-SDK Runtimes

Escape Hatch

Agent Attribution

Model Constants

Environment Variables

Architecture