@classytic/ai-io

v0.0.4

Published

8 days ago

AI provider primitives — thin SDK wrappers for OpenAI, Gemini, ElevenLabs, Fal, Kling, Pexels, Pixabay, Giphy, Tenor, Wikimedia, Openverse. Pure I/O (no DB, no runtime), ESM-first, tree-shakable, subpath-imported. The leaf-level toolbox that Prism (and an

0High
0Medium
0Low

siam923

classytic-bot

ai openai gemini elevenlabs fal kling whisper pexels pixabay giphy tenor gif stock tts transcribe video image-generation prism typescript esm

@classytic/ai-io

AI provider primitives — thin SDK wrappers for image / video / TTS / STT / SFX / music / stock / web-search APIs. Pure I/O (no DB, no runtime, no encryption, no workflows, no domain shapes). The leaf-level toolbox that arc-ai composes into agents, that Prism composes into workflows, and that any host composes into pipelines.

Scope

In scope (provider primitives):

Image gen — OpenAI gpt-image-2, Gemini Nano Banana Pro / 2
Video gen — Gemini Veo 3.1 / Lite; (Phase 2b: Fal, Kling)
TTS — Gemini TTS; (Phase 2b: ElevenLabs)
STT / transcription — OpenAI Whisper word-timestamps
SFX / music — (Phase 2b: ElevenLabs)
Stock — Pexels, Pixabay
Public / CC images — Wikimedia Commons, Openverse (keyless, license-attributed real-world imagery)
Web search tool — (Phase 2b: OpenAI built-in tool wrapper)
Voices — provider-neutral catalog (Gemini + ElevenLabs), effect modifiers
Util — pcmToWav, snapVeoDuration, pollUntil, format parsers

Out of scope (deliberately):

❌ Text / LLM clients (generateText, generateObject, agents). That's arc-ai's surface. ai-io doesn't wrap arc-ai; arc-ai consumes ai-io for the tool side and owns text generation itself.
❌ Domain Zod schemas (script / scene / caption shapes). Those belong with the consumer (arc-ai for agent outputs, Prism for resource shapes).
❌ Workflow orchestration (retries-as-policy, durable state, cost accounting). Host responsibility — @classytic/streamline does this for arc-based hosts.
❌ Mongo / DB / encryption / multi-tenancy. Pure I/O. Host owns its key source via config arg.

Install

pnpm add @classytic/ai-io
# plus whichever vendor SDKs you actually use:
pnpm add openai @google/genai
# (Phase 2b leaves will add: @elevenlabs/elevenlabs-js, @fal-ai/client)

ESM-only ("type": "module"). There is no CommonJS build — require('@classytic/ai-io') will not work; use import (Node ≥ 22). This is deliberate: the package is a thin, tree-shakable leaf with dynamically-imported vendor SDKs.

Vendor SDKs are peer deps and explicitly external in the bundler — host pins versions, @classytic/ai-io/dist stays at ~118 kB.

ai-io uses each vendor's official SDK directly — @google/genai for Gemini (image / video / tts), openai for OpenAI (image / transcribe). No ai / @ai-sdk/* abstraction in the path. The AI-SDK abstraction is arc-ai's surface for provider-agnostic agents; ai-io is the lower-level primitive arc-ai composes when it needs a tool, not the abstraction itself.

Subpath imports (tree-shakable)

// Phase 1
import { createPexelsClient }      from '@classytic/ai-io/providers/pexels';
import { createPixabayClient }     from '@classytic/ai-io/providers/pixabay';
import { createTranscribeClient }  from '@classytic/ai-io/providers/openai/transcribe';

// Phase 2
import { createOpenAIImageClient } from '@classytic/ai-io/providers/openai/image';
import { createGeminiImageClient } from '@classytic/ai-io/providers/gemini/image';
import { createGeminiVideoClient } from '@classytic/ai-io/providers/gemini/video';
import { createGeminiTtsClient }   from '@classytic/ai-io/providers/gemini/tts';

// Catalog + helpers
import { findVoice, ALL_VOICES, VOICE_EFFECTS } from '@classytic/ai-io/voices';
import { pcmToWav, snapVeoDuration }            from '@classytic/ai-io/util';
import { loadPresets, renderPreset }            from '@classytic/ai-io/util';
import { pollUntil, AiIoRateLimitError }        from '@classytic/ai-io/core';

Engine (one-stop config)

import { createAiIo } from '@classytic/ai-io';

const io = createAiIo({
  openai:  { apiKey: process.env.OPENAI_API_KEY! },
  gemini:  { apiKey: process.env.GEMINI_API_KEY! },
  pexels:  { apiKey: process.env.PEXELS_API_KEY! },
  pixabay: { apiKey: process.env.PIXABAY_API_KEY! },
  // Cross-provider observability — bridge onto your event bus / logger here.
  hooks: {
    onCost: (info) => providerRepo.recordUsage(info.provider, info.costUsd),
    onError: (ctx, info) => log.warn({ ...ctx, ...info }, 'ai-io error'),
  },
});

await io.openai.transcribe.transcribe({ filePath });
await io.openai.image.generate({ prompt, aspectRatio: '16:9', imageSize: '2K' });
await io.gemini.video.generate({ prompt, firstFrameUrl, duration: 8 });
await io.gemini.tts.synthesize({ text, voice: 'Puck' });

Sections are built lazily — access an unconfigured one and you get AiIoMissingKeyError immediately.

Design rules

No globals, no DB, no env-var lookup. Every function takes its config (including API key) as an argument.
One file per leaf. Each provider/feature is a discrete subpath module.
Peer-dep the vendor SDK. Bundler external: [/^@ai-sdk\//, /^@google\//, /^ai(\/|$)/, ...] — verified at build time (108 kB total).
Buffer in, buffer out. Binary outputs (audio/video/image) return Buffer — host decides where it lands.
Cost as data. Every output that has a cost includes costUsd. Telemetry hooks (onCost) let hosts record it on a separate sink from HTTP telemetry.
Telemetry hooks, not loggers. Structured RequestContext / CostInfo payloads → hook → host's bus/logger/OTel. Hook errors are swallowed.
No retries by default. Each leaf takes optional retries: N on transient 429/5xx via shared transport; hosts that wrap leaves in durable workflows (streamline) already own retry semantics.
Track the latest models. Defaults pin the current production tier (e.g. gpt-image-2, gemini-3-pro-image-preview, veo-3.1-generate-preview). Each release of this package bumps to whatever is current.

Module map

| Subpath | What it contains | Peer deps used | |---|---|---| | core | Errors, transport (httpJson/httpRaw), telemetry hooks, pollUntil | — | | util | pcmToWav, pcmDurationSec, parsePcmMimeType, snapVeoDuration, loadPresets, renderPreset | — | | voices | Unified voice catalog (Gemini + ElevenLabs), effect modifiers, findVoice | — | | providers/pexels | Pexels videos + photos search and detail fetch | — | | providers/pixabay | Pixabay videos + photos search and detail fetch | — | | providers/wikimedia | Wikimedia Commons image search — keyless, real/factual subjects, CC/PD license attribution | — | | providers/openverse | Openverse CC-image search — keyless (anon) or Bearer token, commercial,modification-filtered, license attribution | — | | providers/openai/transcribe | Whisper whisper-1 word-level transcription | — | | providers/openai/image | gpt-image-2 / gpt-image-1 / dall-e-3 | ai, @ai-sdk/openai | | providers/gemini/image | Nano Banana Pro / 2 | ai, @ai-sdk/google | | providers/gemini/video | Veo 3.1 / Lite (i2v, t2v, multi-ref) | @google/genai | | providers/gemini/tts | Gemini TTS, single & multi-speaker | @google/genai |

Polling primitive — note

@classytic/ai-io/core exports pollUntil because no polling primitive exists in arc or @classytic/primitives today (verified May 2026: arc has polling code internal to redis-stream + streamline.engine.waitFor, but nothing exported for general use). Domain-agnostic algorithm — should migrate to @classytic/primitives/polling later; this package will then re-export from there with the same shape.

Live smoke (pre-publish verify)

Mocked tests prove request/response shape; they don't catch SDK-version drift, auth-header mismatches, or endpoint URL bugs. Before publishing — and again post-publish against the installed dist — run the live smoke:

# Cheap leaves only (~$0.13, ~30s wall-clock)
PEXELS_API_KEY=... PIXABAY_API_KEY=... OPENAI_API_KEY=... \
GEMINI_API_KEY=... ELEVENLABS_API_KEY=... \
  npm run live-smoke

# Include Veo final (~$2.00 for one 4s video) — slowest, costliest leaf
npm run live-smoke:full

# Cheaper Veo draft tier (~$0.80) instead of full
npm run live-smoke -- --with-veo-draft

# Run against the published / installed dist instead of src
npm run live-smoke:dist

Each leaf is gated on its env var; missing keys skip silently. Final report: leaf × outcome × wall-clock × est-cost. Exits non-zero on any failure.

Cost reporting — what `costUsd` actually is

costUsd from each leaf is an estimate, not an invoice number:

Rate cards are hardcoded in the package at published Creator-tier rates per provider.
Unit counts come from the request (text.length, duration, numberOfVideos) or the response (Whisper.duration).
Where a provider returns usage data (Whisper duration, OpenAI gpt-image-1 token counts, Gemini usageMetadata), we surface it on output.usage for diagnostic — but costUsd always uses the flat rate × unit count, because no provider returns dollar cost on the wire.
Veo, Pexels, Pixabay don't return usage; cost is the per-tier flat × numberOfVideos.

Accurate to ~5–15% for budgeting / monthly-cap enforcement. For exact billing, reconcile against the provider's dashboard at month-end. If you need per-tier rate-card overrides (Pro / Enterprise consumers paying different rates), open an issue — that lands in 0.1.x as a rateCard config slot.

Telemetry hook shapes

TelemetryHooks.onError(ctx, info) payload:

interface ErrorInfo {
  error: AiIoError | Error;   // full Error instance for instanceof / cause chain
  code: string;                // 'AiIoRateLimitError' | 'AiIoAuthError' | ...
  message: string;             // ≤500 chars, already truncated for HTTP body errors
  attempts: number;
  durationMs: number;
}

code and message are surfaced at the top level so consumers can log without spreading or reaching into info.error.*:

const io = createAiIo({
  // ...
  hooks: {
    onError: (ctx, info) => log.warn({
      requestId: ctx.requestId,
      operation: ctx.operation,
      code: info.code,
      message: info.message,
      attempts: info.attempts,
    }),
    onCost: (info) => providerRepo.recordUsage(info.provider, info.costUsd),
  },
});

Switch on info.code for retry / fallback decisions; use info.error when you need the full instance (e.g. info.error instanceof AiIoRateLimitError to read retryAfterSec).

Conventions

Follows the classytic ecosystem patterns:

Testing tiers per testing-infrastructure.md — tests/{unit,integration,e2e} with vitest.config.ts projects: [...]. pnpm test = unit + integration; pnpm test:e2e opt-in.
Error hierarchy modeled on fin-io — AiIoError → subclasses for status / cause / retryability classification.
Subpath exports only — no root barrel for data (engine factory is the only thing at .).
ESM-only, engines.node >= 22, tsdown bundling with vendor externals.

Status

✅ Phase 1: Pexels, Pixabay, Whisper, voices, util
✅ Phase 1.5: TelemetryHooks, tiered tests, error hierarchy, pollUntil
✅ Phase 2: OpenAI image, Gemini image / video / tts
⏳ Phase 2b: ElevenLabs (tts / sfx / music / stt), Fal, Kling, OpenAI web-search tool
⏳ Phase 3 (deferred until demand): standalone MCP factory for non-arc hosts

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@classytic/ai-io

Scope

Install

Subpath imports (tree-shakable)

Engine (one-stop config)

Design rules

Module map

Polling primitive — note

Live smoke (pre-publish verify)

Cost reporting — what costUsd actually is

Telemetry hook shapes

Conventions

Status

License

Cost reporting — what `costUsd` actually is