doubao-ai-toolkit
v1.0.1
Published
Claude Code Skill - ByteDance Doubao (豆包) AI Toolkit. One-line commands for image/video generation, TTS, ASR, chat, search, and embeddings via Volcengine Ark.
Maintainers
Readme
🚀 Doubao AI Toolkit
ByteDance Volcengine Ark AI Toolkit — One-line commands for image generation, video creation, TTS, ASR, chat, search, and embeddings.
doubao-ai-toolkit is a Claude Code / OpenClaw skill that wraps ByteDance's Volcengine Ark (火山方舟) AI platform into a simple, unified CLI. Powered by Doubao (豆包), Seedream, and Seedance models — generate images, create videos, synthesize speech, recognize voice, chat with LLMs, search the web, and compute embeddings, all with one-line commands.
✨ Features
| Category | Capability | Models |
|----------|-----------|--------|
| 🎨 Image | Text-to-image, image-to-image, group generation | doubao-seedream-4-0 |
| 🎬 Video | Text-to-video, image-to-video | doubao-seedance-1-0-pro, -lite, -fast |
| 🗣️ TTS | Text-to-speech with 4+ Chinese voices | zh_female_xiaohe, zh_male_xiaoqiu |
| 👂 ASR | Speech recognition (URL + local files) | fun-asr |
| 💬 Chat | Multi-model text generation (streaming) | doubao-seed-1-8, doubao-pro-32k |
| 🌐 Search | Web + image search | - |
| 🧮 Embedding | Text, image, and video embeddings | - |
📦 Installation
1. Install the CLIs
# Core CLI (image, video, TTS, ASR, chat, search, embedding)
npm install -g coze-coding-dev-sdk
# Image generation specialist (Seedream 4.0 — more control)
npm install -g seedream-ark
# Video generation specialist (Seedance 2.0)
npm install -g seedance2. Configure API Key
# Windows PowerShell
$env:ARK_API_KEY = "your-api-key-here"
# Linux / macOS
export ARK_API_KEY="your-api-key-here"🔑 Get your API Key: Volcengine Ark Console
3. Install the Skill (for Claude Code / OpenClaw)
Copy SKILL.md to your skills directory, or use the skill manager.
🚀 Quick Start
# 🎨 Generate an image
coze-coding-ai image -p "A cat astronaut on Mars" -o cat.png
# 🎬 Create a video
coze-coding-ai video -p "Ocean waves in slow motion" -d 5 -o wave.json
# 🗣️ Synthesize speech
coze-coding-ai tts -t "你好,欢迎使用豆包"
# 👂 Transcribe audio
coze-coding-ai asr -f ./meeting.mp3
# 💬 Chat with LLM
coze-coding-ai chat -p "Explain quantum computing"
# 🌐 Search the web
coze-coding-ai search -q "Latest AI trends 2026" --count 10
# 🧮 Compute embeddings
coze-coding-ai embedding -t "AI is transforming the world" -d 1024📖 Detailed Usage
🎨 Image Generation
coze-coding-ai (simplest)
# Basic text-to-image
coze-coding-ai image -p "一只穿太空服的猫在火星上漫步" -o cat.png
# 2K high-res output
coze-coding-ai image \
-p "纯黑色男士短袖T恤,亚马逊电商主图,纯白背景" \
--size 2K \
-o tshirt.png
# 4K ultra-high-res
coze-coding-ai image \
-p "山水风景画,中国水墨风格,云雾缭绕" \
--size 4K \
-o landscape.pngseedream (advanced control)
# Single image
seedream generate \
--prompt "未来城市天际线,赛博朋克风格" \
--size 4K \
--output ./generated/
# Group generation (1–15 images)
seedream generate \
--prompt "同一只白色猫咪的9种不同表情和姿态" \
--group \
--max-images 9 \
--size 2K \
--output ./cats/
# Image-to-image (reference image)
seedream generate \
--prompt "将这张照片转换为油画风格" \
--image ./photo.jpg \
--size 2K \
--output ./styled/
# Custom dimensions
seedream generate \
--prompt "手机壁纸,极简风格" \
--size 1080x1920 \
--output ./wallpapers/
# Dry run (preview only)
seedream generate --prompt "test" --dry-run🎬 Video Generation
# Text-to-video (5 seconds)
coze-coding-ai video \
-p "海浪拍打礁石,慢动作,电影质感" \
-d 5 \
-o wave.json
# Text-to-video (10 seconds, custom resolution)
coze-coding-ai video \
-p "城市夜景延时摄影,车流光轨,4K" \
-d 10 \
-s 1920x1080 \
-o city_night.json
# Image-to-video
coze-coding-ai video \
-p "让画面中的人物微笑并眨眼" \
-i https://example.com/portrait.jpg \
-d 5 \
-o animate.json
# Fixed camera + no watermark
coze-coding-ai video \
-p "产品360度旋转展示" \
--camerafixed \
--no-watermark \
-d 5
# Custom model
coze-coding-ai video \
-p "科幻场景" \
--model doubao-seedance-1-0-pro-fast-251015 \
-d 5
# Async callback mode
coze-coding-ai video \
-p "..." \
--callback-url https://your-server.com/callback \
-d 10Video Model Selection:
| Model ID | Type |
|----------|------|
| doubao-seedance-1-0-pro-fast-251015 | Fast (default) |
| doubao-seedance-1-0-pro-251015 | High quality |
| doubao-seedance-1-0-lite-t2v-250428 | Lite text-to-video |
| doubao-seedance-1-0-lite-i2v-250428 | Lite image-to-video |
🗣️ Speech Synthesis (TTS)
# Basic Chinese TTS
coze-coding-ai tts -t "你好,欢迎使用豆包语音合成"
# Specify speaker
coze-coding-ai tts \
-t "今天天气真不错,适合出去走走" \
--speaker zh_female_xiaohe_uranus_bigtts
# Long text
coze-coding-ai tts \
-t "春眠不觉晓,处处闻啼鸟。夜来风雨声,花落知多少。"Available Speakers:
| Speaker ID | Description |
|-----------|-------------|
| zh_female_xiaohe_uranus_bigtts | Female - Xiaohe (default) |
| zh_male_xiaoqiu_uranus_bigtts | Male - Xiaoqiu |
| zh_female_qingxin_uranus_bigtts | Female - Qingxin |
| zh_female_shuangkuai_uranus_bigtts | Female - Shuankuai |
👂 Speech Recognition (ASR)
# URL-based
coze-coding-ai asr -u https://example.com/audio.mp3
# Local file
coze-coding-ai asr -f ./meeting.mp3
# Long audio
coze-coding-ai asr -f ./lecture.wav
# Verbose logging
coze-coding-ai asr -f ./audio.mp3 --verbose💬 Text Chat
# Basic chat
coze-coding-ai chat -p "用中文写一首关于春天的诗"
# With system prompt
coze-coding-ai chat \
-s "你是一个专业的技术文档撰写助手" \
-p "帮我写一段 REST API 文档"
# Custom model + temperature
coze-coding-ai chat \
-p "解释量子计算的基本原理" \
--model doubao-seed-1-8-251228 \
--temperature 0.3
# Streaming output
coze-coding-ai chat \
-p "讲一个关于AI的短故事" \
--streamAvailable Chat Models:
doubao-seed-1-8-251228(default) — Doubao Seed 1.8doubao-pro-32k-241215— Doubao Pro 32Kdoubao-lite-32k-241215— Doubao Lite 32K
🌐 Web Search
# Web search
coze-coding-ai search -q "2026年最新AI技术趋势" --count 10
# Image search
coze-coding-ai search \
-q "埃菲尔铁塔" \
--type image \
--count 5
# Custom count
coze-coding-ai search -q "今天天气" --type web --count 3🧮 Embeddings
# Text embedding
coze-coding-ai embedding -t "人工智能正在改变世界" -d 1024
# Multiple texts
coze-coding-ai embedding \
-t "第一段文字" \
-t "第二段文字" \
-d 1024 \
-o embeddings.json
# Image embedding
coze-coding-ai embedding --image-url https://example.com/photo.jpg -d 1024
# Video embedding
coze-coding-ai embedding --video-url https://example.com/video.mp4 -d 1024🛒 E-Commerce Product Image Template
# Set API Key
$env:ARK_API_KEY = "your-api-key"
# White background - front flat lay
coze-coding-ai image \
-p "纯黑色男士短袖T恤,亚马逊电商主图,纯白背景,正面平铺展示,圆领设计,高级面料质感,专业商业产品摄影,影棚布光" \
--size 2K \
-o ./tshirt-front.png
# White background - back flat lay
coze-coding-ai image \
-p "纯黑色男士短袖T恤,亚马逊电商主图,纯白背景,背面平铺展示,圆领后领设计,高级面料质感,专业产品摄影" \
--size 2K \
-o ./tshirt-back.png
# Model wearing
coze-coding-ai image \
-p "年轻亚洲男模穿着纯黑色圆领短袖T恤,亚马逊电商主图,纯白背景,正面全身展示,专业时尚摄影,自然站姿" \
--size 2K \
-o ./tshirt-model.png
# Detail close-up
coze-coding-ai image \
-p "纯黑色男士T恤领口细节特写,面料纹理清晰可见,亚马逊电商产品图,微距摄影,专业商业摄影" \
--size 2K \
-o ./tshirt-detail.png
# Lifestyle scene
coze-coding-ai image \
-p "年轻男士穿着纯黑色T恤在户外咖啡馆,自然光线,生活方式摄影,亚马逊电商场景图,休闲时尚" \
--size 2K \
-o ./tshirt-lifestyle.png
# Batch group generation (seedream)
seedream generate \
--prompt "纯黑色男士短袖T恤的6种不同角度产品展示,亚马逊电商主图,纯白背景,专业摄影" \
--group \
--max-images 6 \
--size 2K \
--output ./product-shots/🔑 Authentication
Three methods, listed by priority:
# Method 1: Environment variable (recommended)
export ARK_API_KEY="your-api-key-here"
# Method 2: Command-line flag (seedream)
seedream generate --api-key "your-key" --prompt "..."
# Method 3: HTTP Header (coze-coding-ai)
coze-coding-ai image -p "..." -H "Authorization: Bearer your-key"| Variable | Purpose | Applies To |
|----------|---------|------------|
| ARK_API_KEY | Volcengine Ark API Key | seedream / coze-coding-ai |
📌
coze-coding-aialso supports-H "Authorization: Bearer <key>"without relying on environment variables.
📋 Key Rules
- API Key security — Use environment variables, never hardcode in scripts
- Image generation — Default model is
doubao-seedream-4-0, supports 2K/4K output - Video generation is synchronous —
coze-coding-ai videowaits for completion - TTS needs no output path — Audio is returned via API response
- ASR supports local files —
-fflag auto-encodes to base64 for upload - Search has web/image modes — Switch with
--type - Use
--help— Every subcommand has built-in documentation
📁 File Structure
doubao-ai-toolkit/
├── SKILL.md # Claude Code / OpenClaw skill definition
├── README.md # This file🆚 Comparison: doubao-ai-toolkit vs bailian-ai-toolkit
| Feature | doubao-ai-toolkit (豆包) | bailian-ai-toolkit (百炼) |
|---------|--------------------------|---------------------------|
| Platform | ByteDance Volcengine Ark | Alibaba Cloud DashScope |
| Image Gen | ✅ Seedream 4.0 (2K/4K) | ✅ Qwen Image 2.0 |
| Video Gen | ✅ Seedance 1.0 Pro | ✅ HappyHorse 1.0 |
| TTS | ✅ 4+ Chinese voices | ✅ 50+ voices, multi-language |
| ASR | ✅ fun-asr | ✅ fun-asr + diarization |
| Vision | ❌ | ✅ Qwen-VL-Max |
| Chat | ✅ Doubao Seed 1.8 | ✅ Qwen 3.6 + DeepSeek |
| Search | ✅ Web + Image | ✅ Web |
| Embedding | ✅ Text + Image + Video | ❌ |
| File Upload | ❌ | ✅ OSS (48h) |
| Auth | ARK_API_KEY env var | bl auth login --api-key |
🤝 Contributing
Issues and PRs welcome! This is a skill wrapper — the core CLIs are:
📄 License
MIT © 2025
Built with ❤️ for the AI developer community
