ai-media-kit

v0.2.2

Published

4 months ago

AI Media Generation SDK — unified interface for Kling, Sora, Veo, Gemini and more

0High
0Medium
0Low

kadxy

ai media generation seedance sora kling veo gemini video image audio

ai-media-kit

Unified TypeScript SDK for AI media generation APIs — video, image, and audio.

Highlights

Multi-provider — Kling, Seedance, Seedance 2, xAI (Grok), Suno, Gemini, and growing
Native request types — Full TypeScript interfaces per endpoint with JSDoc; no dumbed-down common denominator
Unified response — Consistent task.status, task.outputs, and task.raw across all providers
Async & sync — Async task model with polling for video/audio; direct results for sync APIs (Gemini)
SSE streaming — Built-in task.toSSEResponse() for browser streaming
Zero runtime dependencies

Install

npm install ai-media-kit
# or
bun add ai-media-kit
# or
pnpm add ai-media-kit

Providers

Seedance 2

Seedance 2.0 (doubao-seedance-2-0) — latest Volcengine video generation model. Supports text-to-video, image-to-video (first frame, last frame, reference image), reference video, reference audio, and audio generation.

import { createSeedance2 } from "ai-media-kit/providers/seedance2";

const sd2 = createSeedance2({
  baseUrl: "https://your-gateway.com/sd2",
  apiKey: process.env.SD2_API_KEY,
});

SD2: Text to Video

const task = await sd2.generate.create({
  model: "doubao-seedance-2-0-260128",
  content: [{ type: "text", text: "a cat yawning lazily in the sun" }],
  generate_audio: true,
  ratio: "16:9",
  resolution: "720p",
  duration: 5,
});

await task.wait();
console.log(task.outputs); // [{ type: "video", url: "https://..." }]

SD2: Image to Video (First Frame)

const task = await sd2.generate.create({
  model: "doubao-seedance-2-0-260128",
  content: [
    { type: "image_url", image_url: { url: "https://example.com/cat.jpg" }, role: "first_frame" },
    { type: "text", text: "the cat starts walking towards the camera" },
  ],
  generate_audio: true,
  ratio: "16:9",
  resolution: "720p",
  duration: 5,
});

SD2: Image to Video (First + Last Frame)

const task = await sd2.generate.create({
  model: "doubao-seedance-2-0-260128",
  content: [
    { type: "image_url", image_url: { url: "https://example.com/start.jpg" }, role: "first_frame" },
    { type: "image_url", image_url: { url: "https://example.com/end.jpg" }, role: "last_frame" },
    { type: "text", text: "smooth transition between two scenes" },
  ],
  generate_audio: true,
  ratio: "16:9",
  resolution: "720p",
  duration: 5,
});

SD2: Reference Image

Use 1–4 reference images to influence the visual style without locking the first frame.

const task = await sd2.generate.create({
  model: "doubao-seedance-2-0-260128",
  content: [
    { type: "image_url", image_url: { url: "https://example.com/ref1.jpg" }, role: "reference_image" },
    { type: "image_url", image_url: { url: "https://example.com/ref2.jpg" }, role: "reference_image" },
    { type: "text", text: "a stylish product showcase in the same visual style" },
  ],
  generate_audio: true,
  ratio: "16:9",
  resolution: "720p",
  duration: 5,
});

SD2: Reference Video

Provide a reference video to guide motion and style.

const task = await sd2.generate.create({
  model: "doubao-seedance-2-0-260128",
  content: [
    { type: "video_url", video_url: { url: "https://example.com/reference.mp4" }, role: "reference_video" },
    { type: "text", text: "recreate this motion with a different character" },
  ],
  generate_audio: true,
  ratio: "16:9",
  resolution: "720p",
  duration: 5,
});

SD2: Reference Audio

Provide a reference audio to sync the generated video.

const task = await sd2.generate.create({
  model: "doubao-seedance-2-0-260128",
  content: [
    { type: "audio_url", audio_url: { url: "https://example.com/narration.mp3" }, role: "reference_audio" },
    { type: "text", text: "a person speaking to the camera in a studio" },
  ],
  generate_audio: true,
  ratio: "16:9",
  resolution: "720p",
  duration: 10,
});

SD2: Combining Multiple Content Types

const task = await sd2.generate.create({
  model: "doubao-seedance-2-0-260128",
  content: [
    { type: "image_url", image_url: { url: "https://example.com/scene.jpg" }, role: "first_frame" },
    { type: "audio_url", audio_url: { url: "https://example.com/bgm.mp3" }, role: "reference_audio" },
    { type: "text", text: "the scene slowly zooms in as music plays" },
  ],
  generate_audio: true,
  ratio: "16:9",
  resolution: "720p",
  duration: 10,
});

SD2: SSE Streaming

// Next.js / Hono / any Web-standard handler
const task = await sd2.generate.create({ ... });
return task.toSSEResponse();

SD2: Resume from Task ID

const task = sd2.generate.fromId("task_xxxx");
await task.refresh(); // single status check

SD2: Parameters

Seedance (Official Volcengine API)

For the official Volcengine Seedance API. Supports all Seedance 1.x models and draft mode.

import { createSeedance } from "ai-media-kit/providers/seedance";

const seedance = createSeedance({
  baseUrl: "https://ark.cn-beijing.volces.com/api/v3",
  apiKey: process.env.SEEDANCE_API_KEY,
});

Seedance: Text to Video

const task = await seedance.text2video.create({
  model: "doubao-seedance-1-5-pro-251215",
  content: [{ type: "text", text: "a cat yawning lazily in the sun" }],
});
await task.wait();
console.log(task.outputs); // [{ type: "video", url: "https://..." }]

Seedance: Image to Video (First Frame)

const task = await seedance.image2video.create({
  model: "doubao-seedance-1-0-pro-250428",
  content: [
    { type: "image_url", image_url: { url: "https://example.com/cat.jpg" } },
    { type: "text", text: "the cat starts walking" },
  ],
});

Seedance: Image to Video (First + Last Frame)

const task = await seedance.image2video.create({
  model: "doubao-seedance-1-0-pro-250428",
  content: [
    { type: "image_url", image_url: { url: "https://example.com/start.jpg" }, role: "first_frame" },
    { type: "image_url", image_url: { url: "https://example.com/end.jpg" }, role: "last_frame" },
    { type: "text", text: "smooth camera dolly from start to end" },
  ],
});

Seedance: Reference Image (1–4 images)

const task = await seedance.image2video.create({
  model: "doubao-seedance-1-0-lite-i2v-250428",
  content: [
    { type: "image_url", image_url: { url: "https://example.com/ref.jpg" }, role: "reference_image" },
    { type: "text", text: "character walks through a park" },
  ],
});

Seedance: Draft Mode (1.5 Pro)

Generate a fast, low-cost draft preview (480p), then promote to full quality.

// Step 1: Draft
const draft = await seedance.text2video.create({
  model: "doubao-seedance-1-5-pro-251215",
  content: [{ type: "text", text: "epic drone shot over a mountain" }],
  draft: true,
});
await draft.wait();

// Step 2: Promote draft to full quality
const final = await seedance.draft2video.create({
  model: "doubao-seedance-1-5-pro-251215",
  content: [{ type: "draft_task", draft_task: { id: draft.taskId } }],
});
await final.wait();

Seedance: Advanced Options

const task = await seedance.text2video.create({
  model: "doubao-seedance-1-5-pro-251215",
  content: [{ type: "text", text: "a cinematic sunset over the ocean" }],
  ratio: "21:9",
  resolution: "1080p",
  duration: 10,
  generate_audio: true,
  return_last_frame: true, // get last frame for chaining videos
  seed: 42,
  service_tier: "flex",    // 50% cost, higher TPD
});

Seedance: Models

Kling

import { createKling } from "ai-media-kit/providers/kling";

const kling = createKling({
  baseUrl: "https://api.klingai.com",
  apiKey: process.env.KLING_API_KEY,
});

Kling: Text to Video

const task = await kling.text2video.create({
  model_name: "kling-v2-6",
  prompt: "a golden retriever running on a beach at sunset",
  mode: "pro",
  aspect_ratio: "16:9",
  duration: "10",
  sound: "on",
});
await task.wait();

Kling: Image to Video

const task = await kling.image2video.create({
  model_name: "kling-v2-6",
  image: "https://example.com/dog.jpg",
  prompt: "the dog starts running towards the camera",
  mode: "pro",
  duration: "5",
  sound: "on",
});

Kling: Image to Video with Camera Control

const task = await kling.image2video.create({
  model_name: "kling-v2-6",
  image: "https://example.com/landscape.jpg",
  prompt: "sweeping landscape view",
  camera_control: {
    type: "simple",
    config: { horizontal: 5, zoom: -3 },
  },
});

Kling: Multi-Image to Video

const task = await kling.multiImage2video.create({
  model_name: "kling-v1-6",
  image_list: [
    { image: "https://example.com/frame1.jpg" },
    { image: "https://example.com/frame2.jpg" },
    { image: "https://example.com/frame3.jpg" },
  ],
  prompt: "smooth transition between the three scenes",
  mode: "pro",
  duration: "10",
});

Kling: Models

| Model | Text-to-Video | Image-to-Video | Sound | | :--- | :--- | :--- | :--- | | kling-v3 | Yes | Yes | — | | kling-v2-6 | Yes | Yes | Yes | | kling-v2-5-turbo | Yes | Yes | — | | kling-v2-1-master | Yes | Yes | — | | kling-v2-master | Yes | Yes | — | | kling-v1-6 | Yes | Yes + multi-image | — | | kling-v1 | Yes | Yes | — |

xAI (Grok Imagine Video)

import { createXai } from "ai-media-kit/providers/xai";

const xai = createXai({
  baseUrl: "https://api.x.ai",
  apiKey: process.env.XAI_API_KEY,
});

xAI: Text to Video

const task = await xai.text2video.create({
  model: "grok-imagine-video",
  prompt: "a cat dancing on the moon",
  duration: 10,
  aspect_ratio: "16:9",
  resolution: "720p",
});
await task.wait();

xAI: Image to Video

const task = await xai.image2video.create({
  model: "grok-imagine-video",
  prompt: "the cat starts walking towards the camera",
  image: { url: "https://example.com/cat.jpg" },
});

xAI: Reference Images

Use <IMAGE_1>, <IMAGE_2> in the prompt to refer to specific images. Ideal for virtual try-on, product placement, or character-consistent storytelling.

const task = await xai.referenceVideo.create({
  model: "grok-imagine-video",
  prompt: "<IMAGE_1> is walking in a park wearing the outfit from <IMAGE_2>",
  reference_images: [
    { url: "https://example.com/person.jpg" },
    { url: "https://example.com/outfit.jpg" },
  ],
});

xAI: Edit Video

High-fidelity edits with strong scene preservation.

const task = await xai.edit.create({
  model: "grok-imagine-video",
  prompt: "give the woman a silver necklace",
  video: { url: "https://example.com/video.mp4" },
});

xAI: Extend Video

Seamlessly extend an existing video. Output is one continuous video.

const task = await xai.extend.create({
  model: "grok-imagine-video",
  prompt: "the camera pans to reveal a vast mountain landscape",
  video: { url: "https://example.com/video.mp4" },
  duration: 5, // extension length (2–10s), added to original
});

Suno (Music Generation)

import { createSuno } from "ai-media-kit/providers/suno";

const suno = createSuno({
  baseUrl: "https://your-gateway.com",
  apiKey: process.env.SUNO_API_KEY,
  pollInterval: 10000,
});

Suno: Inspiration Mode

Describe what you want — AI generates lyrics, melody, everything.

const task = await suno.inspiration.create({
  mvVersion: "chirp-v5",
  inputType: "10",
  gptDescriptionPrompt: "a warm pop song about driving home at night",
});
await task.wait();
// task.outputs → AudioOutput[] with extras (cover art, music video)

Suno: Custom Lyrics Mode

const task = await suno.customLyrics.create({
  mvVersion: "chirp-v5",
  inputType: "20",
  prompt: "[Verse 1]\nCity lights flicker on\nThe highway hums along\n\n[Chorus]\nDriving home tonight...",
  tags: "pop,female voice,warm,acoustic",
  title: "Night Breeze",
});

Suno: Continue a Track

Extend an existing clip from a specific timestamp.

const task = await suno.continue.create({
  mvVersion: "chirp-v5",
  inputType: "20",
  prompt: "[Verse 2]\nMoonlight on the road...",
  continueClipId: "clip_xxx",
  continueAt: "27", // seconds
});

Suno: Cover Mode

const task = await suno.cover.create({
  mvVersion: "chirp-v5",
  inputType: "20",
  coverClipId: "clip_xxx",
  tags: "jazz,male voice",
});

Suno: Persona Mode

Generate music using a trained voice/style persona.

// Step 1: Create a persona from a reference clip
const persona = await suno.createPersona({
  root_clip_id: "clip_xxx",
  name: "My Voice",
});

// Step 2: Generate with persona
const task = await suno.persona.create({
  mvVersion: "chirp-v5",
  inputType: "20",
  prompt: "[Verse]\nNew lyrics here...",
  tags: "pop,upbeat",
  title: "New Song",
  task: "artist_consistency",
  metadataParams: {
    artist_clip_id: persona.artist_clip_id,
    persona_id: persona.persona_id,
  },
});

Suno: Sound Effects

const task = await suno.sound.create({
  mvVersion: "chirp-v5",
  inputType: "30",
  gptDescriptionPrompt: "ocean waves crashing on rocks with seagulls",
});

Suno: Upload & Concat

// Upload audio by URL — get a clipId for continue/cover
const clip = await suno.uploadByUrl("https://example.com/song.mp3");
console.log(clip.clipId);

// Upload a local file
const clip = await suno.upload(file);

// Assemble clips into a full-length song
await suno.concat(clip.clipId);

Gemini (Image Generation)

Gemini image generation is synchronous — create() blocks until the image is ready (typically 5–30s), then returns the result directly. No Task, no polling.

import { createGemini } from "ai-media-kit/providers/gemini";

const gemini = createGemini({
  baseUrl: "https://generativelanguage.googleapis.com",
  apiKey: process.env.GEMINI_API_KEY,
});

Gemini: Text to Image

const result = await gemini.generateContent.create({
  model: "gemini-3.1-flash-image-preview",
  contents: [{ parts: [{ text: "A cat astronaut on the moon" }] }],
});

if (result.status === "completed") {
  console.log(result.outputs); // [{ type: "image", url: "data:image/png;base64,..." }]
} else {
  console.error(result.error); // { code, message }
}

Gemini: Image Editing

const result = await gemini.generateContent.create({
  model: "gemini-3.1-flash-image-preview",
  contents: [{
    parts: [
      { text: "Add a wizard hat to this cat" },
      { inlineData: { mimeType: "image/png", data: base64ImageData } },
    ],
  }],
  generationConfig: {
    responseModalities: ["TEXT", "IMAGE"],
    imageConfig: { aspectRatio: "1:1", imageSize: "2K" },
  },
});

Gemini: Google Search Grounding

const result = await gemini.generateContent.create({
  model: "gemini-3.1-flash-image-preview",
  contents: [{ parts: [{ text: "Current weather in Tokyo as an infographic" }] }],
  generationConfig: { responseModalities: ["TEXT", "IMAGE"] },
  tools: [{ googleSearch: {} }],
});

Gemini: Models

Task Lifecycle (Async Providers)

For async providers (Kling, Seedance, Seedance 2, xAI, Suno), create() submits a task and returns a Task object:

create() ──→ submitted ──→ queued ──→ processing ──→ completed
                                                  └─→ failed / cancelled / expired

Usage Patterns

const task = await provider.endpoint.create({ ... });

// 1. Block until done
await task.wait();

// 2. Single status check (no polling)
await task.refresh();

// 3. Event listener (no polling — call wait() or toSSEResponse() to start polling)
task.on("complete", (t) => console.log(t.outputs));
task.on("progress", (t) => console.log(t.progress));
task.on("error", (err) => console.error(err.message));

// 4. SSE Response for browsers (starts polling)
return task.toSSEResponse();

Unified Result Shape

task.taskId    // string — provider-assigned task ID
task.status    // "submitted" | "queued" | "processing" | "completed" | "failed" | "cancelled" | "expired"
task.progress  // number | null (null when provider doesn't report progress)
task.outputs   // MediaOutput[] | null
task.error     // { code, message } | null
task.raw       // Raw upstream response, as-is

Resuming Tasks

const task = provider.endpoint.fromId("existing-task-id");
await task.refresh();  // one-shot check
await task.wait();     // or resume polling

SSE Event Format

// SSE events sent by toSSEResponse():
// event: progress → { taskId, status, progress }
// event: complete → { taskId, status, outputs }
// event: error    → { taskId, status, message, code, raw }

Error Handling

import { APIError, TaskError } from "ai-media-kit";

try {
  const task = await provider.endpoint.create({ ... });
  await task.wait();
} catch (err) {
  if (err instanceof TaskError) {
    // Business failure: task failed, cancelled, or expired
    console.error(err.code, err.message, err.raw);
  }
  if (err instanceof APIError) {
    // Protocol error: HTTP 401, 429, 500, malformed response
    console.error(err.status, err.message, err.raw);
  }
}

Provider Config

All providers accept a ProviderConfig:

{
  baseUrl: string;       // Official API or your proxy/gateway
  apiKey: string;
  auth?: AuthConfig;     // Default: Bearer token
  defaultHeaders?: Record<string, string | null>;
  pollInterval?: number; // Polling interval in ms (default: 2000)
  debug?: boolean;       // Log requests & polling (default: false)
}

Imports

// Import from root
import { createKling, createSeedance, createSeedance2, createXai, createSuno, createGemini } from "ai-media-kit";

// Or import individual providers (tree-shakable)
import { createSeedance2 } from "ai-media-kit/providers/seedance2";
import { createKling } from "ai-media-kit/providers/kling";

// Import types
import type { Task, MediaOutput, VideoOutput, ImageOutput, AudioOutput, TaskStatus } from "ai-media-kit";

Development

bun install              # Install dependencies
bun run build            # TypeScript compile
bun run dev              # Dev server (watch mode)
bun run lint             # Biome check
bun run format           # Biome fix
bun test                 # Unit tests (mocked)
bun run test:integration # Integration tests (needs .env)
bun run clean            # Clean build artifacts

Copy .env.example to .env and fill in your API keys before running integration tests.

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

ai-media-kit

Highlights

Install

Providers

Seedance 2

SD2: Text to Video

SD2: Image to Video (First Frame)

SD2: Image to Video (First + Last Frame)

SD2: Reference Image

SD2: Reference Video

SD2: Reference Audio

SD2: Combining Multiple Content Types

SD2: SSE Streaming

SD2: Resume from Task ID

SD2: Parameters

Seedance (Official Volcengine API)

Seedance: Text to Video

Seedance: Image to Video (First Frame)

Seedance: Image to Video (First + Last Frame)

Seedance: Reference Image (1–4 images)

Seedance: Draft Mode (1.5 Pro)

Seedance: Advanced Options

Seedance: Models

Kling

Kling: Text to Video

Kling: Image to Video

Kling: Image to Video with Camera Control

Kling: Multi-Image to Video

Kling: Models

xAI (Grok Imagine Video)

xAI: Text to Video

xAI: Image to Video

xAI: Reference Images

xAI: Edit Video

xAI: Extend Video

Suno (Music Generation)

Suno: Inspiration Mode

Suno: Custom Lyrics Mode

Suno: Continue a Track

Suno: Cover Mode

Suno: Persona Mode

Suno: Sound Effects

Suno: Upload & Concat

Gemini (Image Generation)

Gemini: Text to Image

Gemini: Image Editing

Gemini: Google Search Grounding

Gemini: Models

Task Lifecycle (Async Providers)

Usage Patterns

Unified Result Shape

Resuming Tasks

SSE Event Format

Error Handling

Provider Config

Imports

Development

License