@classytic/ai-io
v0.0.4
Published
AI provider primitives — thin SDK wrappers for OpenAI, Gemini, ElevenLabs, Fal, Kling, Pexels, Pixabay, Giphy, Tenor, Wikimedia, Openverse. Pure I/O (no DB, no runtime), ESM-first, tree-shakable, subpath-imported. The leaf-level toolbox that Prism (and an
Readme
@classytic/ai-io
AI provider primitives — thin SDK wrappers for image / video / TTS / STT / SFX / music / stock / web-search APIs. Pure I/O (no DB, no runtime, no encryption, no workflows, no domain shapes). The leaf-level toolbox that arc-ai composes into agents, that Prism composes into workflows, and that any host composes into pipelines.
Scope
In scope (provider primitives):
- Image gen — OpenAI gpt-image-2, Gemini Nano Banana Pro / 2
- Video gen — Gemini Veo 3.1 / Lite; (Phase 2b: Fal, Kling)
- TTS — Gemini TTS; (Phase 2b: ElevenLabs)
- STT / transcription — OpenAI Whisper word-timestamps
- SFX / music — (Phase 2b: ElevenLabs)
- Stock — Pexels, Pixabay
- Public / CC images — Wikimedia Commons, Openverse (keyless, license-attributed real-world imagery)
- Web search tool — (Phase 2b: OpenAI built-in tool wrapper)
- Voices — provider-neutral catalog (Gemini + ElevenLabs), effect modifiers
- Util —
pcmToWav,snapVeoDuration,pollUntil, format parsers
Out of scope (deliberately):
- ❌ Text / LLM clients (
generateText,generateObject, agents). That's arc-ai's surface. ai-io doesn't wrap arc-ai; arc-ai consumes ai-io for the tool side and owns text generation itself. - ❌ Domain Zod schemas (script / scene / caption shapes). Those belong with the consumer (arc-ai for agent outputs, Prism for resource shapes).
- ❌ Workflow orchestration (retries-as-policy, durable state, cost accounting). Host responsibility —
@classytic/streamlinedoes this for arc-based hosts. - ❌ Mongo / DB / encryption / multi-tenancy. Pure I/O. Host owns its key source via config arg.
Install
pnpm add @classytic/ai-io
# plus whichever vendor SDKs you actually use:
pnpm add openai @google/genai
# (Phase 2b leaves will add: @elevenlabs/elevenlabs-js, @fal-ai/client)ESM-only (
"type": "module"). There is no CommonJS build —require('@classytic/ai-io')will not work; useimport(Node ≥ 22). This is deliberate: the package is a thin, tree-shakable leaf with dynamically-imported vendor SDKs.
Vendor SDKs are peer deps and explicitly external in the bundler — host pins versions, @classytic/ai-io/dist stays at ~118 kB.
ai-io uses each vendor's official SDK directly — @google/genai for Gemini (image / video / tts), openai for OpenAI (image / transcribe). No ai / @ai-sdk/* abstraction in the path. The AI-SDK abstraction is arc-ai's surface for provider-agnostic agents; ai-io is the lower-level primitive arc-ai composes when it needs a tool, not the abstraction itself.
Subpath imports (tree-shakable)
// Phase 1
import { createPexelsClient } from '@classytic/ai-io/providers/pexels';
import { createPixabayClient } from '@classytic/ai-io/providers/pixabay';
import { createTranscribeClient } from '@classytic/ai-io/providers/openai/transcribe';
// Phase 2
import { createOpenAIImageClient } from '@classytic/ai-io/providers/openai/image';
import { createGeminiImageClient } from '@classytic/ai-io/providers/gemini/image';
import { createGeminiVideoClient } from '@classytic/ai-io/providers/gemini/video';
import { createGeminiTtsClient } from '@classytic/ai-io/providers/gemini/tts';
// Catalog + helpers
import { findVoice, ALL_VOICES, VOICE_EFFECTS } from '@classytic/ai-io/voices';
import { pcmToWav, snapVeoDuration } from '@classytic/ai-io/util';
import { loadPresets, renderPreset } from '@classytic/ai-io/util';
import { pollUntil, AiIoRateLimitError } from '@classytic/ai-io/core';Engine (one-stop config)
import { createAiIo } from '@classytic/ai-io';
const io = createAiIo({
openai: { apiKey: process.env.OPENAI_API_KEY! },
gemini: { apiKey: process.env.GEMINI_API_KEY! },
pexels: { apiKey: process.env.PEXELS_API_KEY! },
pixabay: { apiKey: process.env.PIXABAY_API_KEY! },
// Cross-provider observability — bridge onto your event bus / logger here.
hooks: {
onCost: (info) => providerRepo.recordUsage(info.provider, info.costUsd),
onError: (ctx, info) => log.warn({ ...ctx, ...info }, 'ai-io error'),
},
});
await io.openai.transcribe.transcribe({ filePath });
await io.openai.image.generate({ prompt, aspectRatio: '16:9', imageSize: '2K' });
await io.gemini.video.generate({ prompt, firstFrameUrl, duration: 8 });
await io.gemini.tts.synthesize({ text, voice: 'Puck' });Sections are built lazily — access an unconfigured one and you get AiIoMissingKeyError immediately.
Design rules
- No globals, no DB, no env-var lookup. Every function takes its config (including API key) as an argument.
- One file per leaf. Each provider/feature is a discrete subpath module.
- Peer-dep the vendor SDK. Bundler
external: [/^@ai-sdk\//, /^@google\//, /^ai(\/|$)/, ...]— verified at build time (108 kB total). - Buffer in, buffer out. Binary outputs (audio/video/image) return
Buffer— host decides where it lands. - Cost as data. Every output that has a cost includes
costUsd. Telemetry hooks (onCost) let hosts record it on a separate sink from HTTP telemetry. - Telemetry hooks, not loggers. Structured
RequestContext/CostInfopayloads → hook → host's bus/logger/OTel. Hook errors are swallowed. - No retries by default. Each leaf takes optional
retries: Non transient 429/5xx via shared transport; hosts that wrap leaves in durable workflows (streamline) already own retry semantics. - Track the latest models. Defaults pin the current production tier (e.g.
gpt-image-2,gemini-3-pro-image-preview,veo-3.1-generate-preview). Each release of this package bumps to whatever is current.
Module map
| Subpath | What it contains | Peer deps used |
|---|---|---|
| core | Errors, transport (httpJson/httpRaw), telemetry hooks, pollUntil | — |
| util | pcmToWav, pcmDurationSec, parsePcmMimeType, snapVeoDuration, loadPresets, renderPreset | — |
| voices | Unified voice catalog (Gemini + ElevenLabs), effect modifiers, findVoice | — |
| providers/pexels | Pexels videos + photos search and detail fetch | — |
| providers/pixabay | Pixabay videos + photos search and detail fetch | — |
| providers/wikimedia | Wikimedia Commons image search — keyless, real/factual subjects, CC/PD license attribution | — |
| providers/openverse | Openverse CC-image search — keyless (anon) or Bearer token, commercial,modification-filtered, license attribution | — |
| providers/openai/transcribe | Whisper whisper-1 word-level transcription | — |
| providers/openai/image | gpt-image-2 / gpt-image-1 / dall-e-3 | ai, @ai-sdk/openai |
| providers/gemini/image | Nano Banana Pro / 2 | ai, @ai-sdk/google |
| providers/gemini/video | Veo 3.1 / Lite (i2v, t2v, multi-ref) | @google/genai |
| providers/gemini/tts | Gemini TTS, single & multi-speaker | @google/genai |
Polling primitive — note
@classytic/ai-io/core exports pollUntil because no polling primitive exists in arc or @classytic/primitives today (verified May 2026: arc has polling code internal to redis-stream + streamline.engine.waitFor, but nothing exported for general use). Domain-agnostic algorithm — should migrate to @classytic/primitives/polling later; this package will then re-export from there with the same shape.
Live smoke (pre-publish verify)
Mocked tests prove request/response shape; they don't catch SDK-version drift, auth-header mismatches, or endpoint URL bugs. Before publishing — and again post-publish against the installed dist — run the live smoke:
# Cheap leaves only (~$0.13, ~30s wall-clock)
PEXELS_API_KEY=... PIXABAY_API_KEY=... OPENAI_API_KEY=... \
GEMINI_API_KEY=... ELEVENLABS_API_KEY=... \
npm run live-smoke
# Include Veo final (~$2.00 for one 4s video) — slowest, costliest leaf
npm run live-smoke:full
# Cheaper Veo draft tier (~$0.80) instead of full
npm run live-smoke -- --with-veo-draft
# Run against the published / installed dist instead of src
npm run live-smoke:distEach leaf is gated on its env var; missing keys skip silently. Final report: leaf × outcome × wall-clock × est-cost. Exits non-zero on any failure.
Cost reporting — what costUsd actually is
costUsd from each leaf is an estimate, not an invoice number:
- Rate cards are hardcoded in the package at published Creator-tier rates per provider.
- Unit counts come from the request (
text.length,duration,numberOfVideos) or the response (Whisper.duration). - Where a provider returns
usagedata (Whisperduration, OpenAI gpt-image-1 token counts, GeminiusageMetadata), we surface it onoutput.usagefor diagnostic — butcostUsdalways uses the flat rate × unit count, because no provider returns dollar cost on the wire. - Veo, Pexels, Pixabay don't return usage; cost is the per-tier flat ×
numberOfVideos.
Accurate to ~5–15% for budgeting / monthly-cap enforcement. For exact billing, reconcile against the provider's dashboard at month-end. If you need per-tier rate-card overrides (Pro / Enterprise consumers paying different rates), open an issue — that lands in 0.1.x as a rateCard config slot.
Telemetry hook shapes
TelemetryHooks.onError(ctx, info) payload:
interface ErrorInfo {
error: AiIoError | Error; // full Error instance for instanceof / cause chain
code: string; // 'AiIoRateLimitError' | 'AiIoAuthError' | ...
message: string; // ≤500 chars, already truncated for HTTP body errors
attempts: number;
durationMs: number;
}code and message are surfaced at the top level so consumers can log without spreading or reaching into info.error.*:
const io = createAiIo({
// ...
hooks: {
onError: (ctx, info) => log.warn({
requestId: ctx.requestId,
operation: ctx.operation,
code: info.code,
message: info.message,
attempts: info.attempts,
}),
onCost: (info) => providerRepo.recordUsage(info.provider, info.costUsd),
},
});Switch on info.code for retry / fallback decisions; use info.error when you need the full instance (e.g. info.error instanceof AiIoRateLimitError to read retryAfterSec).
Conventions
Follows the classytic ecosystem patterns:
- Testing tiers per
testing-infrastructure.md—tests/{unit,integration,e2e}withvitest.config.tsprojects: [...].pnpm test= unit + integration;pnpm test:e2eopt-in. - Error hierarchy modeled on fin-io —
AiIoError→ subclasses for status / cause / retryability classification. - Subpath exports only — no root barrel for data (engine factory is the only thing at
.). - ESM-only,
engines.node >= 22, tsdown bundling with vendor externals.
Status
- ✅ Phase 1: Pexels, Pixabay, Whisper, voices, util
- ✅ Phase 1.5: TelemetryHooks, tiered tests, error hierarchy,
pollUntil - ✅ Phase 2: OpenAI image, Gemini image / video / tts
- ⏳ Phase 2b: ElevenLabs (tts / sfx / music / stt), Fal, Kling, OpenAI web-search tool
- ⏳ Phase 3 (deferred until demand): standalone MCP factory for non-arc hosts
License
MIT © Classytic
