ai-media-kit
v0.2.2
Published
AI Media Generation SDK — unified interface for Kling, Sora, Veo, Gemini and more
Maintainers
Readme
ai-media-kit
Unified TypeScript SDK for AI media generation APIs — video, image, and audio.
Highlights
- Multi-provider — Kling, Seedance, Seedance 2, xAI (Grok), Suno, Gemini, and growing
- Native request types — Full TypeScript interfaces per endpoint with JSDoc; no dumbed-down common denominator
- Unified response — Consistent
task.status,task.outputs, andtask.rawacross all providers - Async & sync — Async task model with polling for video/audio; direct results for sync APIs (Gemini)
- SSE streaming — Built-in
task.toSSEResponse()for browser streaming - Zero runtime dependencies
Install
npm install ai-media-kit
# or
bun add ai-media-kit
# or
pnpm add ai-media-kitProviders
| Provider | Factory | Endpoints | Media | Model |
| :--- | :--- | :--- | :--- | :--- |
| Seedance 2 | createSeedance2() | generate | Video | Async (Task) |
| Seedance | createSeedance() | text2video, image2video, draft2video, generate | Video | Async (Task) |
| Kling | createKling() | text2video, image2video, multiImage2video | Video | Async (Task) |
| xAI (Grok) | createXai() | text2video, image2video, referenceVideo, generate, edit, extend | Video | Async (Task) |
| Suno | createSuno() | inspiration, customLyrics, continue, cover, persona, sound, generate + upload/concat/createPersona | Audio | Async (Task) |
| Gemini | createGemini() | generateContent | Image | Sync (direct) |
Seedance 2
Seedance 2.0 (doubao-seedance-2-0) — latest Volcengine video generation model. Supports text-to-video, image-to-video (first frame, last frame, reference image), reference video, reference audio, and audio generation.
import { createSeedance2 } from "ai-media-kit/providers/seedance2";
const sd2 = createSeedance2({
baseUrl: "https://your-gateway.com/sd2",
apiKey: process.env.SD2_API_KEY,
});SD2: Text to Video
const task = await sd2.generate.create({
model: "doubao-seedance-2-0-260128",
content: [{ type: "text", text: "a cat yawning lazily in the sun" }],
generate_audio: true,
ratio: "16:9",
resolution: "720p",
duration: 5,
});
await task.wait();
console.log(task.outputs); // [{ type: "video", url: "https://..." }]SD2: Image to Video (First Frame)
const task = await sd2.generate.create({
model: "doubao-seedance-2-0-260128",
content: [
{ type: "image_url", image_url: { url: "https://example.com/cat.jpg" }, role: "first_frame" },
{ type: "text", text: "the cat starts walking towards the camera" },
],
generate_audio: true,
ratio: "16:9",
resolution: "720p",
duration: 5,
});SD2: Image to Video (First + Last Frame)
const task = await sd2.generate.create({
model: "doubao-seedance-2-0-260128",
content: [
{ type: "image_url", image_url: { url: "https://example.com/start.jpg" }, role: "first_frame" },
{ type: "image_url", image_url: { url: "https://example.com/end.jpg" }, role: "last_frame" },
{ type: "text", text: "smooth transition between two scenes" },
],
generate_audio: true,
ratio: "16:9",
resolution: "720p",
duration: 5,
});SD2: Reference Image
Use 1–4 reference images to influence the visual style without locking the first frame.
const task = await sd2.generate.create({
model: "doubao-seedance-2-0-260128",
content: [
{ type: "image_url", image_url: { url: "https://example.com/ref1.jpg" }, role: "reference_image" },
{ type: "image_url", image_url: { url: "https://example.com/ref2.jpg" }, role: "reference_image" },
{ type: "text", text: "a stylish product showcase in the same visual style" },
],
generate_audio: true,
ratio: "16:9",
resolution: "720p",
duration: 5,
});SD2: Reference Video
Provide a reference video to guide motion and style.
const task = await sd2.generate.create({
model: "doubao-seedance-2-0-260128",
content: [
{ type: "video_url", video_url: { url: "https://example.com/reference.mp4" }, role: "reference_video" },
{ type: "text", text: "recreate this motion with a different character" },
],
generate_audio: true,
ratio: "16:9",
resolution: "720p",
duration: 5,
});SD2: Reference Audio
Provide a reference audio to sync the generated video.
const task = await sd2.generate.create({
model: "doubao-seedance-2-0-260128",
content: [
{ type: "audio_url", audio_url: { url: "https://example.com/narration.mp3" }, role: "reference_audio" },
{ type: "text", text: "a person speaking to the camera in a studio" },
],
generate_audio: true,
ratio: "16:9",
resolution: "720p",
duration: 10,
});SD2: Combining Multiple Content Types
const task = await sd2.generate.create({
model: "doubao-seedance-2-0-260128",
content: [
{ type: "image_url", image_url: { url: "https://example.com/scene.jpg" }, role: "first_frame" },
{ type: "audio_url", audio_url: { url: "https://example.com/bgm.mp3" }, role: "reference_audio" },
{ type: "text", text: "the scene slowly zooms in as music plays" },
],
generate_audio: true,
ratio: "16:9",
resolution: "720p",
duration: 10,
});SD2: SSE Streaming
// Next.js / Hono / any Web-standard handler
const task = await sd2.generate.create({ ... });
return task.toSSEResponse();SD2: Resume from Task ID
const task = sd2.generate.fromId("task_xxxx");
await task.refresh(); // single status checkSD2: Parameters
| Parameter | Type | Default | Description |
| :--- | :--- | :--- | :--- |
| model | string | — | "doubao-seedance-2-0-260128" or "doubao-seedance-1-5-pro-251215" |
| content | array | — | Multimodal array: text, image_url, video_url, audio_url |
| generate_audio | boolean | true | Generate synchronized audio |
| ratio | string | "16:9" | "16:9" "9:16" "1:1" "4:3" "3:4" "21:9" "adaptive" |
| resolution | string | "720p" | "480p" or "720p" |
| duration | number | 5 | Video duration in seconds (5–15) |
Seedance (Official Volcengine API)
For the official Volcengine Seedance API. Supports all Seedance 1.x models and draft mode.
import { createSeedance } from "ai-media-kit/providers/seedance";
const seedance = createSeedance({
baseUrl: "https://ark.cn-beijing.volces.com/api/v3",
apiKey: process.env.SEEDANCE_API_KEY,
});Seedance: Text to Video
const task = await seedance.text2video.create({
model: "doubao-seedance-1-5-pro-251215",
content: [{ type: "text", text: "a cat yawning lazily in the sun" }],
});
await task.wait();
console.log(task.outputs); // [{ type: "video", url: "https://..." }]Seedance: Image to Video (First Frame)
const task = await seedance.image2video.create({
model: "doubao-seedance-1-0-pro-250428",
content: [
{ type: "image_url", image_url: { url: "https://example.com/cat.jpg" } },
{ type: "text", text: "the cat starts walking" },
],
});Seedance: Image to Video (First + Last Frame)
const task = await seedance.image2video.create({
model: "doubao-seedance-1-0-pro-250428",
content: [
{ type: "image_url", image_url: { url: "https://example.com/start.jpg" }, role: "first_frame" },
{ type: "image_url", image_url: { url: "https://example.com/end.jpg" }, role: "last_frame" },
{ type: "text", text: "smooth camera dolly from start to end" },
],
});Seedance: Reference Image (1–4 images)
const task = await seedance.image2video.create({
model: "doubao-seedance-1-0-lite-i2v-250428",
content: [
{ type: "image_url", image_url: { url: "https://example.com/ref.jpg" }, role: "reference_image" },
{ type: "text", text: "character walks through a park" },
],
});Seedance: Draft Mode (1.5 Pro)
Generate a fast, low-cost draft preview (480p), then promote to full quality.
// Step 1: Draft
const draft = await seedance.text2video.create({
model: "doubao-seedance-1-5-pro-251215",
content: [{ type: "text", text: "epic drone shot over a mountain" }],
draft: true,
});
await draft.wait();
// Step 2: Promote draft to full quality
const final = await seedance.draft2video.create({
model: "doubao-seedance-1-5-pro-251215",
content: [{ type: "draft_task", draft_task: { id: draft.taskId } }],
});
await final.wait();Seedance: Advanced Options
const task = await seedance.text2video.create({
model: "doubao-seedance-1-5-pro-251215",
content: [{ type: "text", text: "a cinematic sunset over the ocean" }],
ratio: "21:9",
resolution: "1080p",
duration: 10,
generate_audio: true,
return_last_frame: true, // get last frame for chaining videos
seed: 42,
service_tier: "flex", // 50% cost, higher TPD
});Seedance: Models
| Model | ID | Capabilities |
| :--- | :--- | :--- |
| 2.0 | doubao-seedance-2-0-260128 | t2v, i2v, reference video/audio, audio gen |
| 2.0 Fast | doubao-seedance-2-0-fast-260128 | Same as 2.0, faster |
| 1.5 Pro | doubao-seedance-1-5-pro-251215 | t2v, i2v, draft mode, audio gen |
| 1.0 Pro | doubao-seedance-1-0-pro-250428 | t2v, i2v (first+last frame) |
| 1.0 Pro Fast | doubao-seedance-1-0-pro-fast-250428 | t2v, i2v (first frame only) |
| 1.0 Lite T2V | doubao-seedance-1-0-lite-t2v-250428 | Text-to-video only |
| 1.0 Lite I2V | doubao-seedance-1-0-lite-i2v-250428 | i2v (first, first+last, reference 1–4) |
Kling
import { createKling } from "ai-media-kit/providers/kling";
const kling = createKling({
baseUrl: "https://api.klingai.com",
apiKey: process.env.KLING_API_KEY,
});Kling: Text to Video
const task = await kling.text2video.create({
model_name: "kling-v2-6",
prompt: "a golden retriever running on a beach at sunset",
mode: "pro",
aspect_ratio: "16:9",
duration: "10",
sound: "on",
});
await task.wait();Kling: Image to Video
const task = await kling.image2video.create({
model_name: "kling-v2-6",
image: "https://example.com/dog.jpg",
prompt: "the dog starts running towards the camera",
mode: "pro",
duration: "5",
sound: "on",
});Kling: Image to Video with Camera Control
const task = await kling.image2video.create({
model_name: "kling-v2-6",
image: "https://example.com/landscape.jpg",
prompt: "sweeping landscape view",
camera_control: {
type: "simple",
config: { horizontal: 5, zoom: -3 },
},
});Kling: Multi-Image to Video
const task = await kling.multiImage2video.create({
model_name: "kling-v1-6",
image_list: [
{ image: "https://example.com/frame1.jpg" },
{ image: "https://example.com/frame2.jpg" },
{ image: "https://example.com/frame3.jpg" },
],
prompt: "smooth transition between the three scenes",
mode: "pro",
duration: "10",
});Kling: Models
| Model | Text-to-Video | Image-to-Video | Sound |
| :--- | :--- | :--- | :--- |
| kling-v3 | Yes | Yes | — |
| kling-v2-6 | Yes | Yes | Yes |
| kling-v2-5-turbo | Yes | Yes | — |
| kling-v2-1-master | Yes | Yes | — |
| kling-v2-master | Yes | Yes | — |
| kling-v1-6 | Yes | Yes + multi-image | — |
| kling-v1 | Yes | Yes | — |
xAI (Grok Imagine Video)
import { createXai } from "ai-media-kit/providers/xai";
const xai = createXai({
baseUrl: "https://api.x.ai",
apiKey: process.env.XAI_API_KEY,
});xAI: Text to Video
const task = await xai.text2video.create({
model: "grok-imagine-video",
prompt: "a cat dancing on the moon",
duration: 10,
aspect_ratio: "16:9",
resolution: "720p",
});
await task.wait();xAI: Image to Video
const task = await xai.image2video.create({
model: "grok-imagine-video",
prompt: "the cat starts walking towards the camera",
image: { url: "https://example.com/cat.jpg" },
});xAI: Reference Images
Use <IMAGE_1>, <IMAGE_2> in the prompt to refer to specific images. Ideal for virtual try-on, product placement, or character-consistent storytelling.
const task = await xai.referenceVideo.create({
model: "grok-imagine-video",
prompt: "<IMAGE_1> is walking in a park wearing the outfit from <IMAGE_2>",
reference_images: [
{ url: "https://example.com/person.jpg" },
{ url: "https://example.com/outfit.jpg" },
],
});xAI: Edit Video
High-fidelity edits with strong scene preservation.
const task = await xai.edit.create({
model: "grok-imagine-video",
prompt: "give the woman a silver necklace",
video: { url: "https://example.com/video.mp4" },
});xAI: Extend Video
Seamlessly extend an existing video. Output is one continuous video.
const task = await xai.extend.create({
model: "grok-imagine-video",
prompt: "the camera pans to reveal a vast mountain landscape",
video: { url: "https://example.com/video.mp4" },
duration: 5, // extension length (2–10s), added to original
});Suno (Music Generation)
import { createSuno } from "ai-media-kit/providers/suno";
const suno = createSuno({
baseUrl: "https://your-gateway.com",
apiKey: process.env.SUNO_API_KEY,
pollInterval: 10000,
});Suno: Inspiration Mode
Describe what you want — AI generates lyrics, melody, everything.
const task = await suno.inspiration.create({
mvVersion: "chirp-v5",
inputType: "10",
gptDescriptionPrompt: "a warm pop song about driving home at night",
});
await task.wait();
// task.outputs → AudioOutput[] with extras (cover art, music video)Suno: Custom Lyrics Mode
const task = await suno.customLyrics.create({
mvVersion: "chirp-v5",
inputType: "20",
prompt: "[Verse 1]\nCity lights flicker on\nThe highway hums along\n\n[Chorus]\nDriving home tonight...",
tags: "pop,female voice,warm,acoustic",
title: "Night Breeze",
});Suno: Continue a Track
Extend an existing clip from a specific timestamp.
const task = await suno.continue.create({
mvVersion: "chirp-v5",
inputType: "20",
prompt: "[Verse 2]\nMoonlight on the road...",
continueClipId: "clip_xxx",
continueAt: "27", // seconds
});Suno: Cover Mode
const task = await suno.cover.create({
mvVersion: "chirp-v5",
inputType: "20",
coverClipId: "clip_xxx",
tags: "jazz,male voice",
});Suno: Persona Mode
Generate music using a trained voice/style persona.
// Step 1: Create a persona from a reference clip
const persona = await suno.createPersona({
root_clip_id: "clip_xxx",
name: "My Voice",
});
// Step 2: Generate with persona
const task = await suno.persona.create({
mvVersion: "chirp-v5",
inputType: "20",
prompt: "[Verse]\nNew lyrics here...",
tags: "pop,upbeat",
title: "New Song",
task: "artist_consistency",
metadataParams: {
artist_clip_id: persona.artist_clip_id,
persona_id: persona.persona_id,
},
});Suno: Sound Effects
const task = await suno.sound.create({
mvVersion: "chirp-v5",
inputType: "30",
gptDescriptionPrompt: "ocean waves crashing on rocks with seagulls",
});Suno: Upload & Concat
// Upload audio by URL — get a clipId for continue/cover
const clip = await suno.uploadByUrl("https://example.com/song.mp3");
console.log(clip.clipId);
// Upload a local file
const clip = await suno.upload(file);
// Assemble clips into a full-length song
await suno.concat(clip.clipId);Gemini (Image Generation)
Gemini image generation is synchronous — create() blocks until the image is ready (typically 5–30s), then returns the result directly. No Task, no polling.
import { createGemini } from "ai-media-kit/providers/gemini";
const gemini = createGemini({
baseUrl: "https://generativelanguage.googleapis.com",
apiKey: process.env.GEMINI_API_KEY,
});Gemini: Text to Image
const result = await gemini.generateContent.create({
model: "gemini-3.1-flash-image-preview",
contents: [{ parts: [{ text: "A cat astronaut on the moon" }] }],
});
if (result.status === "completed") {
console.log(result.outputs); // [{ type: "image", url: "data:image/png;base64,..." }]
} else {
console.error(result.error); // { code, message }
}Gemini: Image Editing
const result = await gemini.generateContent.create({
model: "gemini-3.1-flash-image-preview",
contents: [{
parts: [
{ text: "Add a wizard hat to this cat" },
{ inlineData: { mimeType: "image/png", data: base64ImageData } },
],
}],
generationConfig: {
responseModalities: ["TEXT", "IMAGE"],
imageConfig: { aspectRatio: "1:1", imageSize: "2K" },
},
});Gemini: Google Search Grounding
const result = await gemini.generateContent.create({
model: "gemini-3.1-flash-image-preview",
contents: [{ parts: [{ text: "Current weather in Tokyo as an infographic" }] }],
generationConfig: { responseModalities: ["TEXT", "IMAGE"] },
tools: [{ googleSearch: {} }],
});Gemini: Models
| Model | ID | Resolution | Notes |
| :--- | :--- | :--- | :--- |
| Nano Banana 2 | gemini-3.1-flash-image-preview | 512/1K/2K/4K | Extended ratios, image search, configurable thinking |
| Nano Banana Pro | gemini-3-pro-image-preview | 1K/2K/4K | Highest quality, thinking always on |
| Nano Banana | gemini-2.5-flash-image | 1K only | Fastest, cheapest |
Task Lifecycle (Async Providers)
For async providers (Kling, Seedance, Seedance 2, xAI, Suno), create() submits a task and returns a Task object:
create() ──→ submitted ──→ queued ──→ processing ──→ completed
└─→ failed / cancelled / expiredUsage Patterns
const task = await provider.endpoint.create({ ... });
// 1. Block until done
await task.wait();
// 2. Single status check (no polling)
await task.refresh();
// 3. Event listener (no polling — call wait() or toSSEResponse() to start polling)
task.on("complete", (t) => console.log(t.outputs));
task.on("progress", (t) => console.log(t.progress));
task.on("error", (err) => console.error(err.message));
// 4. SSE Response for browsers (starts polling)
return task.toSSEResponse();Unified Result Shape
task.taskId // string — provider-assigned task ID
task.status // "submitted" | "queued" | "processing" | "completed" | "failed" | "cancelled" | "expired"
task.progress // number | null (null when provider doesn't report progress)
task.outputs // MediaOutput[] | null
task.error // { code, message } | null
task.raw // Raw upstream response, as-isResuming Tasks
const task = provider.endpoint.fromId("existing-task-id");
await task.refresh(); // one-shot check
await task.wait(); // or resume pollingSSE Event Format
// SSE events sent by toSSEResponse():
// event: progress → { taskId, status, progress }
// event: complete → { taskId, status, outputs }
// event: error → { taskId, status, message, code, raw }Error Handling
import { APIError, TaskError } from "ai-media-kit";
try {
const task = await provider.endpoint.create({ ... });
await task.wait();
} catch (err) {
if (err instanceof TaskError) {
// Business failure: task failed, cancelled, or expired
console.error(err.code, err.message, err.raw);
}
if (err instanceof APIError) {
// Protocol error: HTTP 401, 429, 500, malformed response
console.error(err.status, err.message, err.raw);
}
}Provider Config
All providers accept a ProviderConfig:
{
baseUrl: string; // Official API or your proxy/gateway
apiKey: string;
auth?: AuthConfig; // Default: Bearer token
defaultHeaders?: Record<string, string | null>;
pollInterval?: number; // Polling interval in ms (default: 2000)
debug?: boolean; // Log requests & polling (default: false)
}Imports
// Import from root
import { createKling, createSeedance, createSeedance2, createXai, createSuno, createGemini } from "ai-media-kit";
// Or import individual providers (tree-shakable)
import { createSeedance2 } from "ai-media-kit/providers/seedance2";
import { createKling } from "ai-media-kit/providers/kling";
// Import types
import type { Task, MediaOutput, VideoOutput, ImageOutput, AudioOutput, TaskStatus } from "ai-media-kit";Development
bun install # Install dependencies
bun run build # TypeScript compile
bun run dev # Dev server (watch mode)
bun run lint # Biome check
bun run format # Biome fix
bun test # Unit tests (mocked)
bun run test:integration # Integration tests (needs .env)
bun run clean # Clean build artifactsCopy .env.example to .env and fill in your API keys before running integration tests.
License
MIT
