@runpod/ai-sdk-provider

v1.4.0

Published

9 days ago

The **Runpod provider** for the [AI SDK](https://ai-sdk.dev/docs) contains language model and image generation support for [Runpod's](https://runpod.io) public endpoints.

0High
0Medium
0Low

ai runpod ai-sdk llm language-model

Runpod AI SDK Provider

The Runpod provider for the AI SDK contains language model and image generation support for Runpod's public endpoints.

Setup

The Runpod provider is available in the @runpod/ai-sdk-provider module. You can install it with:

# npm
npm install @runpod/ai-sdk-provider

# pnpm
pnpm add @runpod/ai-sdk-provider

# yarn
yarn add @runpod/ai-sdk-provider

# bun
bun add @runpod/ai-sdk-provider

Provider Instance

You can import the default provider instance runpod from @runpod/ai-sdk-provider:

import { runpod } from '@runpod/ai-sdk-provider';

If you need a customized setup, you can import createRunpod and create a provider instance with your settings:

import { createRunpod } from '@runpod/ai-sdk-provider';

const runpod = createRunpod({
  apiKey: 'your-api-key', // optional, defaults to RUNPOD_API_KEY environment variable
  baseURL: 'custom-url', // optional, for custom endpoints
  headers: {
    /* custom headers */
  }, // optional
});

You can use the following optional settings to customize the Runpod provider instance:

baseURL string
Use a different URL prefix for API calls, e.g. to use proxy servers or custom endpoints. Supports vLLM deployments, SGLang servers, and any OpenAI-compatible API. The default prefix is https://api.runpod.ai/v2.
apiKey string
API key that is being sent using the Authorization header. It defaults to the RUNPOD_API_KEY environment variable. You can obtain your api key from the Runpod Console under "API Keys".
headers Record<string,string>
Custom headers to include in the requests.
fetch (input: RequestInfo, init?: RequestInit) => Promise<Response>
Custom fetch implementation. You can use it as a middleware to intercept requests, or to provide a custom fetch implementation for e.g. testing.

Language Models

You can create language models using the provider instance. The first argument is the model ID:

import { runpod } from '@runpod/ai-sdk-provider';
import { generateText } from 'ai';

const { text } = await generateText({
  model: runpod('qwen/qwen3-32b-awq'),
  prompt: 'What is the capital of Germany?',
});

Returns:

text - Generated text string
finishReason - Why generation stopped ('stop', 'length', etc.)
usage - Token usage information (prompt, completion, total tokens)

Streaming

import { runpod } from '@runpod/ai-sdk-provider';
import { streamText } from 'ai';

const { textStream } = await streamText({
  model: runpod('qwen/qwen3-32b-awq'),
  prompt:
    'Write a short poem about artificial intelligence in exactly 4 lines.',
  temperature: 0.7,
});

for await (const delta of textStream) {
  process.stdout.write(delta);
}

Examples

Check out our examples for more code snippets on how to use all the different models.

Supported Models

| Model ID | Description | Streaming | Object Generation | Tool Usage | Reasoning Notes | | --------------------------------- | ------------------------------------------------------------------- | --------- | ----------------- | ---------- | ------------------------- | | qwen/qwen3-32b-awq | 32B parameter multilingual model with strong reasoning capabilities | ✅ | ❌ | ✅ | Standard reasoning events | | openai/gpt-oss-120b | 120B parameter open-source GPT model | ✅ | ❌ | ✅ | Standard reasoning events | | deepcogito/cogito-671b-v2.1-fp8 | 671B parameter Cogito model with FP8 quantization | ✅ | ❌ | ✅ | Standard reasoning events |

Note: This list is not complete. For a full list of all available models, see the Runpod Public Endpoint Reference.

Chat Conversations

const { text } = await generateText({
  model: runpod('qwen/qwen3-32b-awq'),
  messages: [
    { role: 'system', content: 'You are a helpful assistant.' },
    { role: 'user', content: 'What is the capital of France?' },
  ],
});

Tool Calling

import { generateText, tool } from 'ai';
import { z } from 'zod';

const { text, toolCalls } = await generateText({
  model: runpod('openai/gpt-oss-120b'),
  prompt: 'What is the weather like in San Francisco?',
  tools: {
    getWeather: tool({
      description: 'Get weather information for a city',
      inputSchema: z.object({
        city: z.string().describe('The city name'),
      }),
      execute: async ({ city }) => {
        return `The weather in ${city} is sunny.`;
      },
    }),
  },
});

Additional Returns:

toolCalls - Array of tool calls made by the model
toolResults - Results from executed tools

Structured output

Using generateObject to enforce structured ouput is not supported by two models that are part of this provider.

You can still return structured data by instructing the model to return JSON and validating it yourself.

import { runpod } from '@runpod/ai-sdk-provider';
import { generateText } from 'ai';
import { z } from 'zod';

const RecipeSchema = z.object({
  name: z.string(),
  ingredients: z.array(z.string()),
  steps: z.array(z.string()),
});

const { text } = await generateText({
  model: runpod('qwen/qwen3-32b-awq'),
  messages: [
    {
      role: 'system',
      content:
        'return ONLY valid JSON matching { name: string; ingredients: string[]; steps: string[] }',
    },
    { role: 'user', content: 'generate a lasagna recipe.' },
  ],
  temperature: 0,
});

const parsed = JSON.parse(text);
const result = RecipeSchema.safeParse(parsed);

if (!result.success) {
  // handle invalid JSON shape
}

console.log(result.success ? result.data : parsed);

Image Models

With image models you can:

Text-to-image: generate a new image from a text prompt.
Edit image: transform an existing image by providing reference image(s).

All examples use the AI SDK's generateImage and runpod.image(modelId).

Text-to-Image

import { runpod } from '@runpod/ai-sdk-provider';
import { generateImage } from 'ai';
import { writeFileSync } from 'fs';

const { image } = await generateImage({
  model: runpod.image('pruna/p-image-t2i'),
  prompt: 'A serene mountain landscape at sunset',
  aspectRatio: '4:3',
});

writeFileSync('image.png', image.uint8Array);

Returns:

image.uint8Array - Binary image data (efficient for processing/saving)
image.base64 - Base64 encoded string (for web display)
image.mediaType - MIME type ('image/jpeg' or 'image/png')
warnings - Array of any warnings about unsupported parameters

Edit Image

For editing, pass reference images via prompt.images (recommended). The AI SDK normalizes prompt.images into files for the provider call.

Single reference image (1 input image)

import { runpod } from '@runpod/ai-sdk-provider';
import { generateImage } from 'ai';

const { image } = await generateImage({
  model: runpod.image('pruna/p-image-edit'),
  prompt: {
    text: 'Virtual staging: add modern Scandinavian furniture: a gray sofa, wooden coffee table, potted plants, and warm lighting',
    images: ['https://image.runpod.ai/demo/empty-room.png'],
  },
  aspectRatio: '16:9',
});

Multiple reference images (4 input images)

Note: Prior to v1.0.0, images were passed via providerOptions.runpod.image / providerOptions.runpod.images. This still works but prompt.images is now recommended.

import { runpod } from '@runpod/ai-sdk-provider';
import { generateImage } from 'ai';

const { image } = await generateImage({
  model: runpod.image('google/nano-banana-pro-edit'),
  prompt: {
    text: 'Combine these four robot musicians into an epic band photo on a concert stage with dramatic lighting',
    images: [
      'https://image.runpod.ai/demo/robot-drummer.png',
      'https://image.runpod.ai/demo/robot-guitarist.png',
      'https://image.runpod.ai/demo/robot-bassist.png',
      'https://image.runpod.ai/demo/robot-singer.png',
    ],
  },
});

Examples

Check out our examples for more code snippets on how to use all the different models.

Supported Models

| Model ID | Type | Resolution | Aspect Ratios | | -------------------------------------- | ---- | ----------------- | ----------------------------------------------- | | alibaba/wan-2.6 | t2i | 768x768–1280x1280 | 1:1, 16:9, 9:16, 4:3, 3:4, 3:2, 2:3, 21:9, 9:21 | | pruna/p-image-t2i | t2i | up to 1440x1440 | 1:1, 16:9, 9:16, 4:3, 3:4, 3:2, 2:3 | | pruna/p-image-edit | edit | up to 1440x1440 | 1:1, 16:9, 9:16, 4:3, 3:4, 3:2, 2:3 | | google/nano-banana-edit | edit | up to 4096x4096 | 1:1, 4:3, 3:4 | | google/nano-banana-pro-edit | edit | 1k, 2k, 4k | 1:1, 16:9, 9:16, 4:3, 3:4, 3:2, 2:3, 21:9 | | google/nano-banana-2-edit | edit | 1k, 2k, 4k | 1:1, 3:2, 2:3, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, 21:9, 1:4, 4:1, 1:8, 8:1 | | bytedance/seedream-3.0 | t2i | up to 4096x4096 | 1:1, 4:3, 3:4 | | bytedance/seedream-4.0 | t2i | up to 4096x4096 | 1:1, 4:3, 3:4 | | bytedance/seedream-4.0-edit | edit | up to 4096x4096 | uses size | | qwen/qwen-image | t2i | up to 4096x4096 | 1:1, 4:3, 3:4 | | qwen/qwen-image-edit | edit | up to 4096x4096 | 1:1, 4:3, 3:4 | | qwen/qwen-image-edit-2511 | edit | up to 1536x1536 | 1:1, 4:3, 3:4 | | tongyi-mai/z-image-turbo | t2i | up to 1536x1536 | 1:1, 4:3, 3:4, 3:2, 2:3, 16:9, 9:16 | | black-forest-labs/flux-1-schnell | t2i | up to 2048x2048 | 1:1, 4:3, 3:4 | | black-forest-labs/flux-1-dev | t2i | up to 2048x2048 | 1:1, 4:3, 3:4 | | black-forest-labs/flux-1-kontext-dev | edit | up to 2048x2048 | 1:1, 4:3, 3:4 |

For the full list of models, see the Runpod Public Endpoint Reference.

Provider Options

Additional options through providerOptions.runpod (supported options depend on the model):

| Option | Type | Default | Description | | ----------------------- | --------- | ------- | -------------------------------------------- | | negative_prompt | string | "" | What to avoid in the image (model-dependent) | | enable_safety_checker | boolean | true | Content safety filtering (model-dependent) | | num_inference_steps | number | Auto | Denoising steps (model-dependent) | | guidance | number | Auto | Prompt adherence strength (model-dependent) | | output_format | string | "png" | Output format: png, jpg, jpeg, webp | | maxPollAttempts | number | 60 | Max polling attempts | | pollIntervalMillis | number | 5000 | Polling interval (ms) |

Example (providerOptions):

const { image } = await generateImage({
  model: runpod.image('bytedance/seedream-3.0'),
  prompt: 'A sunset over mountains',
  size: '1328x1328',
  seed: 42,
  providerOptions: {
    runpod: {
      negative_prompt: 'blurry, low quality',
      enable_safety_checker: true,
      maxPollAttempts: 30,
      pollIntervalMillis: 4000,
    },
  },
});

Model-specific Notes

Pruna (p-image)

Supported models: pruna/p-image-t2i, pruna/p-image-edit

Text-to-image: supports standard aspectRatio values; for custom dimensions, set providerOptions.runpod.aspect_ratio = 'custom' and provide width/height.
Edit image: supports 1–5 input images via prompt.images (recommended) or files.

Example: Custom Dimensions (t2i)

const { image } = await generateImage({
  model: runpod.image('pruna/p-image-t2i'),
  prompt: 'A robot',
  providerOptions: {
    runpod: {
      aspect_ratio: 'custom',
      width: 512,
      height: 768,
    },
  },
});

Alibaba (WAN 2.6)

Text-to-image model with flexible resolution support.

Resolution constraints:

Total pixels: 589,824 (768x768) to 1,638,400 (1280x1280)
Aspect ratio: 1:4 to 4:1
Default: 1280x1280

Recommended resolutions for common aspect ratios:

| Aspect Ratio | Resolution | | :----------- | :--------- | | 1:1 | 1280x1280 | | 2:3 | 800x1200 | | 3:2 | 1200x800 | | 3:4 | 960x1280 | | 4:3 | 1280x960 | | 9:16 | 720x1280 | | 16:9 | 1280x720 | | 21:9 | 1344x576 | | 9:21 | 576x1344 |

const { image } = await generateImage({
  model: runpod.image('alibaba/wan-2.6'),
  prompt: 'A serene mountain landscape at dawn',
  aspectRatio: '16:9',
});

Google (Nano Banana Pro)

| Option | Values | | :---------------------------------- | :--------------- | | providerOptions.runpod.resolution | 1k, 2k, 4k |

const { image } = await generateImage({
  model: runpod.image('google/nano-banana-pro'),
  prompt: 'A futuristic cityscape at sunset',
  aspectRatio: '16:9',
  providerOptions: {
    runpod: {
      resolution: '4k',
    },
  },
});

Qwen (Image Edit 2511)

| Option | Values | | :----------------------------- | :--------------------- | | providerOptions.runpod.loras | [{path, scale}, ...] |

Supports 1-3 input images.

const { image } = await generateImage({
  model: runpod.image('qwen/qwen-image-edit-2511'),
  prompt: {
    text: 'Transform into anime style',
    images: ['https://image.runpod.ai/asset/qwen/qwen-image-edit-2511.png'],
  },
  size: '1024x1024',
  providerOptions: {
    runpod: {
      loras: [
        {
          path: 'https://huggingface.co/flymy-ai/qwen-image-anime-irl-lora/resolve/main/flymy_anime_irl.safetensors',
          scale: 1,
        },
      ],
    },
  },
});

Tongyi-MAI (Z-Image Turbo)

Supported model: tongyi-mai/z-image-turbo

Supported sizes (validated by provider): 512x512, 768x768, 1024x1024, 1280x1280, 1536x1536, 512x768, 768x512, 1024x768, 768x1024, 1328x1328, 1472x1140, 1140x1472, 768x432, 1024x576, 1280x720, 1536x864, 432x768, 576x1024, 720x1280, 864x1536
Supported aspectRatio values: 1:1, 4:3, 3:4, 3:2, 2:3, 16:9, 9:16 (maps to sizes above; use size for exact dimensions)
Additional parameters: strength, output_format, enable_safety_checker, seed

Speech Models

Generate speech using the AI SDK's generateSpeech and runpod.speech(...):

import { runpod } from '@runpod/ai-sdk-provider';
import { generateSpeech } from 'ai';

const result = await generateSpeech({
  model: runpod.speech('resembleai/chatterbox-turbo'),
  text: 'Hello from Runpod.',
});

// Save to filesystem:
import { writeFileSync } from 'fs';
writeFileSync('speech.wav', result.audio.uint8Array);

Returns:

result.audio (GeneratedAudioFile)
- result.audio.uint8Array (binary audio)
- result.audio.base64 (base64-encoded audio)
- result.audio.mediaType (e.g. audio/wav)
- result.audio.format (e.g. wav)
result.warnings (e.g. unsupported parameters)
result.responses (telemetry/debug metadata)
result.providerMetadata.runpod
- audioUrl (public URL to the generated audio)
- cost (if available)

Examples

Check out our examples for more code snippets on how to use all the different models.

Supported Models

resembleai/chatterbox-turbo

`resembleai/chatterbox-turbo`

Parameters

| Parameter | Type | Default | Description | | --------- | -------- | -------- | ---------------------------------------- | | text | string | - | Required. The text to convert to speech. | | voice | string | "lucy" | Built-in voice name (see list below). |

Provider Options

Use providerOptions.runpod for model-specific parameters:

| Option | Type | Default | Description | | ----------- | -------- | ------- | ------------------------------------------- | | voice_url | string | - | URL to audio file (5–10s) for voice cloning | | voiceUrl | string | - | Alias for voice_url |

Note: If voice_url is provided, the built-in voice is ignored.
Note: This speech endpoint currently returns WAV only; outputFormat is ignored.

Voices

voice selects one of the built-in voices (default: lucy):

[
  'aaron',
  'abigail',
  'anaya',
  'andy',
  'archer',
  'brian',
  'chloe',
  'dylan',
  'emmanuel',
  'ethan',
  'evelyn',
  'gavin',
  'gordon',
  'ivan',
  'laura',
  'lucy',
  'madison',
  'marisol',
  'meera',
  'walter',
];

Voice cloning (via URL)

Use providerOptions.runpod.voice_url (or voiceUrl) to clone a voice from a short reference audio (5–10s):

const result = await generateSpeech({
  model: runpod.speech('resembleai/chatterbox-turbo'),
  text: 'Hello!',
  providerOptions: {
    runpod: {
      voice_url: 'https://your-audio-host.com/your-voice-sample.wav', // 5-10s audio sample
    },
  },
});

Paralinguistic Tags

Include these tags inline with your text to trigger realistic vocal expressions:

| Tag | Effect | | ---------------- | --------------- | | [clear throat] | Throat clearing | | [sigh] | Sighing | | [sush] | Shushing | | [cough] | Coughing | | [groan] | Groaning | | [sniff] | Sniffing | | [gasp] | Gasping | | [chuckle] | Chuckling | | [laugh] | Laughing |

const result = await generateSpeech({
  model: runpod.speech('resembleai/chatterbox-turbo'),
  text: `[sigh] I can't believe it worked! [laugh] This is amazing.`,
  voice: 'lucy',
});

Transcription Models

Transcribe audio using the AI SDK's experimental_transcribe and runpod.transcription(...):

import { runpod } from '@runpod/ai-sdk-provider';
import { experimental_transcribe as transcribe } from 'ai';

const result = await transcribe({
  model: runpod.transcription('pruna/whisper-v3-large'),
  audio: new URL('https://image.runpod.ai/demo/transcription-demo.wav'),
});

console.log(result.text);

Returns:

result.text - Full transcription text
result.segments - Array of segments with timing info
- segment.text - Segment text
- segment.startSecond - Start time in seconds
- segment.endSecond - End time in seconds
result.language - Detected language code
result.durationInSeconds - Audio duration
result.warnings - Array of any warnings
result.providerMetadata.runpod.jobId - RunPod job ID

Audio Input

You can provide audio in several ways:

// URL (recommended for large files)
const result = await transcribe({
  model: runpod.transcription('pruna/whisper-v3-large'),
  audio: new URL('https://image.runpod.ai/demo/transcription-demo.wav'),
});

// Local file as Uint8Array
import { readFileSync } from 'fs';
const audioData = readFileSync('./audio.wav');

const result = await transcribe({
  model: runpod.transcription('pruna/whisper-v3-large'),
  audio: audioData,
});

Examples

Check out our examples for more code snippets on how to use all the different models.

Supported Models

pruna/whisper-v3-large

Provider Options

Use providerOptions.runpod for model-specific parameters:

| Option | Type | Default | Description | | -------------------- | --------- | ------- | ---------------------------------------------- | | audio | string | - | URL to audio file (alternative to binary data) | | prompt | string | - | Context prompt to guide transcription | | language | string | Auto | ISO-639-1 language code (e.g., 'en', 'es') | | word_timestamps | boolean | false | Include word-level timestamps | | translate | boolean | false | Translate audio to English | | enable_vad | boolean | false | Enable voice activity detection | | maxPollAttempts | number | 120 | Max polling attempts | | pollIntervalMillis | number | 2000 | Polling interval (ms) |

Example (providerOptions):

const result = await transcribe({
  model: runpod.transcription('pruna/whisper-v3-large'),
  audio: new URL('https://image.runpod.ai/demo/transcription-demo.wav'),
  providerOptions: {
    runpod: {
      language: 'en',
      prompt: 'This is a demo of audio transcription',
      word_timestamps: true,
    },
  },
});

Video Models

Generate videos using the AI SDK's experimental_generateVideo and runpod.video(...):

import { runpod } from '@runpod/ai-sdk-provider';
import { experimental_generateVideo as generateVideo } from 'ai';

// Text-to-video
const result = await generateVideo({
  model: runpod.video('alibaba/wan-2.6-t2v'),
  prompt: 'A golden retriever running on a sunny beach, cinematic, 4k',
});

console.log(result.video.url);

// Image-to-video
const result = await generateVideo({
  model: runpod.video('alibaba/wan-2.6-i2v'),
  prompt: 'Animate this scene with gentle camera movement',
  image: new URL('https://example.com/image.png'),
});

console.log(result.video.url);

Returns:

result.video - Generated video ({ type: 'url', url, mediaType: 'video/mp4' })
result.warnings - Array of any warnings
result.providerMetadata.runpod.jobId - Runpod job ID

Examples

Check out our examples for more code snippets on how to use all the different models.

Supported Models

| Model ID | Type | Company | | --------------------------------------- | ----------- | ------------------- | | pruna/p-video | t2v | Pruna AI | | vidu/q3-t2v | t2v | Shengshu Technology | | vidu/q3-i2v | i2v | Shengshu Technology | | kwaivgi/kling-v2.6-std-motion-control | i2v + video | KwaiVGI (Kuaishou) | | kwaivgi/kling-video-o1-r2v | i2v | KwaiVGI (Kuaishou) | | kwaivgi/kling-v2.1-i2v-pro | i2v | KwaiVGI (Kuaishou) | | alibaba/wan-2.6-t2v | t2v | Alibaba | | alibaba/wan-2.6-i2v | i2v | Alibaba | | alibaba/wan-2.5 | i2v | Alibaba | | alibaba/wan-2.2-t2v-720-lora | i2v | Alibaba | | alibaba/wan-2.2-i2v-720 | i2v | Alibaba | | alibaba/wan-2.1-i2v-720 | i2v | Alibaba | | bytedance/seedance-v1.5-pro-i2v | i2v | ByteDance | | openai/sora-2-pro-i2v | i2v | OpenAI | | openai/sora-2-i2v | i2v | OpenAI |

Provider Options

Use providerOptions.runpod for model-specific parameters:

| Option | Type | Default | Description | | --------------------- | -------- | ------- | ------------------------------------ | | negative_prompt | string | - | What to avoid in the generated video | | guidance_scale | number | - | Guidance scale for prompt adherence | | num_inference_steps | number | - | Number of inference steps | | style | string | - | Style preset (model-specific) | | maxPollAttempts | number | 120 | Max polling attempts | | pollIntervalMillis | number | 5000 | Polling interval (ms) |

Any additional model-specific parameters can be passed through providerOptions.runpod and will be forwarded to the API.

Example (providerOptions):

const result = await generateVideo({
  model: runpod.video('alibaba/wan-2.6-t2v'),
  prompt: 'A serene mountain landscape with flowing water',
  duration: 5,
  aspectRatio: '16:9',
  seed: 42,
  providerOptions: {
    runpod: {
      negative_prompt: 'blurry, low quality',
      guidance_scale: 7.5,
    },
  },
});

About Runpod

Runpod is the foundation for developers to build, deploy, and scale custom AI systems.

Beyond some of the public endpoints you've seen above (+ more generative media APIs), Runpod offers private serverless endpoints / pods / instant clusters, so that you can train, fine-tune or run any open-source or private model on your terms.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

Runpod AI SDK Provider

Setup

Provider Instance

Language Models

Streaming

Examples

Supported Models

Chat Conversations

Tool Calling

Structured output

Image Models

Text-to-Image

Edit Image

Single reference image (1 input image)

Multiple reference images (4 input images)

Examples

Supported Models

Provider Options

Model-specific Notes

Pruna (p-image)

Alibaba (WAN 2.6)

Google (Nano Banana Pro)

Qwen (Image Edit 2511)

Tongyi-MAI (Z-Image Turbo)

Speech Models

Examples

Supported Models

resembleai/chatterbox-turbo

Parameters

Provider Options

Voices

Voice cloning (via URL)

Paralinguistic Tags

Transcription Models

Audio Input

Examples

Supported Models

Provider Options

Video Models

Examples

Supported Models

Provider Options

About Runpod

`resembleai/chatterbox-turbo`