pixcli
v3.3.2
Published
The creative toolkit for AI agents — generate images, video, voiceover, music, sound effects, and full podcasts from the command line.
Downloads
4,704
Maintainers
Readme
pixcli
The creative toolkit for AI agents — generate images, videos, voiceover, music, and sound effects from the command line. Create and edite anything.
Install
npm install -g pixcliOr run without installing (use --yes to skip the install prompt — required for AI agents):
npx --yes pixcli image "a red fox in a forest"Auth
export METERKEY_API_KEY="mk-prod-..."Get your key at shellbot.sh.
Commands
pixcli image <prompt> — Generate images
pixcli image "Studio product shot of wireless earbuds, soft lighting" -r 16:9 -q high -o earbuds.png
pixcli image "Abstract pattern, dark blue and cyan" -n 4
pixcli image "Red sneaker, centered, clean edges" -t -o sneaker.png
pixcli image "Same product, warm background" --from product.png -o warm.png| Option | Default | Description |
|--------|---------|-------------|
| -r, --ratio | 1:1 | 1:1, 16:9, 9:16, 4:3, 3:4, 3:2, 2:3 |
| -q, --quality | standard | draft, standard, high |
| -t, --transparent | false | Transparent background (PNG) |
| -n, --count | 1 | Number of images (1-4) |
| --from | — | Source image for I2I |
| -m, --model | auto | Model ID (bypasses auto-classification) |
| -o, --output | auto | Output path |
| --json | false | JSON output |
pixcli edit <prompt> — Edit images
pixcli edit "Remove the background" -i photo.jpg -o photo-nobg.png
pixcli edit "Upscale to max resolution" -i hero.png -q high -o hero-4k.png
pixcli edit "Apply the style from the reference" -i photo.jpg -i style-ref.jpg| Option | Default | Description |
|--------|---------|-------------|
| -i, --image | required | Source image (repeatable) |
| -q, --quality | standard | draft, standard, high |
| -m, --model | auto | Model ID |
| -o, --output | auto | Output path |
| --json | false | JSON output |
pixcli video <prompt> — Generate video
pixcli video "Slow orbit around the product" --from product.png -d 5 -o reveal.mp4
pixcli video "A cat walking through a garden" -o cat.mp4
pixcli video "The cat jumps over a fence" --from cat.mp4 --extend -o extended.mp4| Option | Default | Description |
|--------|---------|-------------|
| --from | — | Source image (I2V) or video (extend); repeatable for multi-reference models |
| --to | — | End frame for start→end transition models |
| -d, --duration | 5 | Duration in seconds (1-15; Veo: 4/6/8) |
| --resolution | 720p | 480p, 720p, 1080p, 4k (Veo 3.1 is resolution-priced) |
| -r, --ratio | 16:9 | 16:9, 9:16, 1:1, 4:3, 3:4 |
| --audio | false | Enable native audio (Veo / Seedance / PixVerse) |
| -q, --quality | standard | draft, standard, high |
| -m, --model | auto | Model ID |
| -o, --output | auto | Output path |
| --extend | false | Extend source video |
| --json | false | JSON output |
pixcli voice <text> — Text-to-speech
Making a podcast / interview / narrated show? Use
pixcli podcast— it scripts, casts voices, scores, covers, and publishes in one call.pixcli voiceis just a single voiceover clip.
pixcli voice "Welcome to the future of productivity." -o voiceover.mp3
pixcli voice "Bienvenidos." --voice Eva --language es -o vo-es.mp3| Option | Default | Description |
|--------|---------|-------------|
| --voice | Rachel | Voice preset or a cloned voice_id. English: Rachel, Aria, Roger, Sarah, ... Spanish (Spain/Castilian — use these for es; the English presets sound LATAM): Eva, Carolina, Lydia, Aitana, Elena, Carmelo, Dani, Emilio, David |
| --engine | elevenlabs | TTS engine: elevenlabs or gemini (steerable) |
| --clone <name> | — | Clone an ElevenLabs voice from --sample audio, then speak with it |
| --sample <path-or-url> | — | Audio sample for --clone (repeatable, 1-5) |
| --language | auto | ISO 639-1 code (eng, spa, fra, deu, jpn, ...) |
| -o, --output | auto | Output path (.mp3) |
| --json | false | JSON output |
pixcli dialogue — Multi-speaker dialogue (Gemini TTS, ≤2 speakers)
pixcli dialogue --speaker "Host:Charon" --speaker "Guest:Kore" \
--line "Host:Welcome to the show!" --line "Guest:Glad to be here." -o ep.mp3
pixcli dialogue --script episode.json -o ep.mp3| Option | Description |
|--------|-------------|
| --speaker <Name:Voice> | Speaker mapping (repeatable, max 2) |
| --line <Speaker:text> | Dialogue line (repeatable, in order) |
| --script <file.json> | Load {speakers,turns} from a JSON file |
| --language | Language code |
| -o, --output | Output path (.mp3) |
pixcli podcast [topic] — Full podcast episodes (auto-scripted + music)
Give a topic; pixcli writes a tagged, language-aware, duration-targeted 2-speaker
script (gemini-3.5-flash), speaks it with Gemini multi-speaker TTS, and mixes in
intro/outro/bed music — one finished .mp3 (44.1 kHz stereo, 192 kbps).
pixcli podcast "the future of espresso" -m 4 -o ep1.mp3
pixcli podcast "la historia del cómic" --language es --speaker host_warm --speaker skeptic --music cinematic -o ep.mp3
pixcli podcast --mode respect --script my-script.txt -o ep.mp3 # keep my wording, just add tags + music| Option | Default | Description |
|--------|---------|-------------|
| [topic] | — | Episode topic (omit only with --script) |
| --mode | auto | auto / improve / respect |
| --language | auto | Omit to auto-detect from topic; es ⇒ Spain/Castilian |
| -m, --minutes | 5 | Target length, 1–10 |
| --speaker <spec> | host+expert | Preset id or "Name:Voice[:persona]" (≤2) |
| --tone, --title | — | Show tone / episode title |
| --script <path> | — | .txt transcript or .json turns (improve/respect) |
| --no-search, --url <url> | — | Google Search grounding is ON by default (--no-search disables) / source URLs to summarise |
| --music <ref> | warm | Mood / library id / WAV asset hash / none |
| --intro,--outro,--bed-gain | 4/5/0.06 | Loud sting seconds + faint under-speech level |
| --no-cover,--cover-prompt,--cover-date | — | Square cover (on by default); override art / date |
| --private, --no-publish | — | Podcasts publish public + permanent by default; opt out with these |
| -o, --output | auto | Output path (.mp3; --wav for lossless master) |
Change an existing episode's visibility anytime with pixcli publish <job_id>
(--private / --unpublish). Music is loud only at the start/end and faint under speech. Expression tags are
always English ([laughing], [agreement]…) even in other languages. Each episode
also gets a public square cover; audio + cover share the job-id stem
(/p/pod-<job_id>.mp3, /p/pod-<job_id>-cover.jpg + share page /pod/<job_id>). List presets + music:
GET /api/v1/audio/podcast/presets.
pixcli tryon — Virtual try-on (FLUX Pro VTO)
pixcli tryon --person model.jpg --garment jacket.png -o tryon.png| Option | Description |
|--------|-------------|
| --person <path-or-url> | required — person image |
| --garment <path-or-url> | required — garment image |
| -p, --prompt <text> | Optional guidance |
| -o, --output | Output path |
pixcli music <prompt> — Generate music
pixcli music "Ambient electronic, minimal beats, corporate feel" -d 45 -o bg.mp3| Option | Default | Description |
|--------|---------|-------------|
| -d, --duration | 30 | Duration in seconds (3-120) |
| -o, --output | auto | Output path (.mp3) |
| --json | false | JSON output |
pixcli sfx <prompt> — Generate sound effects
pixcli sfx "Smooth cinematic whoosh" -d 1.5 -o whoosh.mp3
pixcli sfx "Soft digital click" -d 0.5 -o click.mp3| Option | Default | Description |
|--------|---------|-------------|
| -d, --duration | 5 | Duration in seconds (0.5-22) |
| -o, --output | auto | Output path (.mp3) |
| --json | false | JSON output |
Publishing (shareable URLs)
Every generation command (image, edit, video, voice, dialogue, music, sfx, tryon) can publish its result to a stable URL at https://pixcli.shellbot.sh/p/<slug>-<id>:
pixcli image "launch hero banner" --publish --publish-name "launch-hero"
pixcli voice "internal memo" --publish-private --publish-ttl 14| Flag | Description |
|------|-------------|
| --publish | Public, anyone-with-the-link URL |
| --publish-private | Requires your API key to fetch (Authorization: Bearer <key>) |
| --publish-ttl <days> | Days until expiry (default 60, max 365 — all URLs expire) |
| --publish-name <name> | Friendly slug |
In --json mode the result includes a published[] array (url, scope, expires_at).
Global options
| Option | Description |
|--------|-------------|
| --key <key> | Override METERKEY_API_KEY |
| --api-url <url> | Override API URL (default: https://pixcli.shellbot.sh) |
| --no-wait | Submit and return the job_id immediately (don't poll) |
| --version | Show version |
| --help | Show help |
JSON mode
All commands support --json for machine-readable output:
pixcli image "hero shot" --json | jq '.files[0].path'{
"job_id": "abc123",
"status": "completed",
"files": [{ "path": "hero-shot.png", "width": 1024, "height": 1024, "mime_type": "image/png" }],
"model": "flux-pro",
"cost": 100000,
"elapsed_ms": 12340
}Models
Models are selected automatically — run pixcli models for the live, filterable list. Highlights:
Image generation
nano-banana-2 (default), nano-banana-pro, gpt-image-2 (best in-image text/typography), flux-pro, flux-dev, seedream-v5, imagen-4, imagen-4-fast, flux-vto (virtual try-on), flux-fill, kontext
Image editing
nano-banana-2-edit-fal, gpt-image-2-edit, kontext, flux-fill, seedream-v5-edit, phota-enhance, rembg, recraft-upscale, aura-sr
Video
veo-3.1-lite (default, native audio), veo-3.1-fast, veo-3.1 (4k), seedance-2-t2v / seedance-2-i2v, grok-imagine-i2v, kling-o3-pro-i2v / -t2v, pixverse-v6-i2v / -t2v, wan-v2-i2v, minimax-i2v, ltx-t2v, ltx-extend-video, heygen-v3-agent
Audio
elevenlabs-tts-v3, gemini-tts, gemini-tts-dialogue, podcast-gemini (podcast), lyria-3-pro (music), elevenlabs-music, elevenlabs-sfx
Architecture
Project structure
cli/
├── src/
│ ├── index.ts ← Entrypoint: autoupgrade check → Commander setup → parse
│ ├── commands/
│ │ ├── image.ts ← Generate images (T2I, I2I)
│ │ ├── edit.ts ← Edit images (upscale, rembg, style transfer)
│ │ ├── video.ts ← Generate / extend video (T2V, I2V)
│ │ ├── voice.ts ← Text-to-speech (+ engine, voice cloning)
│ │ ├── dialogue.ts ← Multi-speaker dialogue (Gemini TTS)
│ │ ├── music.ts ← Music generation
│ │ ├── sfx.ts ← Sound effects generation
│ │ ├── tryon.ts ← Virtual try-on (FLUX Pro VTO)
│ │ ├── models.ts ← List available models
│ │ ├── link.ts ← Open Canva Mode dashboard
│ │ └── job.ts ← Check job status / results
│ └── lib/
│ ├── config.ts ← Version, API key & base URL resolution
│ ├── client.ts ← HTTP client (PixClient: post, get, upload, stream)
│ ├── poller.ts ← Async job polling until terminal state
│ ├── upload.ts ← Local file → presigned upload → URL
│ ├── download.ts ← Asset URLs → local files
│ ├── output.ts ← Spinners, JSON mode, human-readable output
│ └── autoupgrade.ts ← Self-upgrade check (hourly, from npm registry)
└── package.jsonRequest flow
┌─────────────────────────────────────────────────────┐
│ AI Agent / User │
│ $ pixcli image "a red fox" -r 16:9 -o fox.png │
└──────────────────────┬──────────────────────────────┘
│
▼
┌────────────────┐
│ Autoupgrade │ Check npm registry (cached, 1x/hour)
│ (silent) │ If newer → npm install -g → re-exec
└───────┬────────┘
│
▼
┌────────────────┐
│ Commander │ Parse args, resolve API key & base URL
│ (index.ts) │
└───────┬────────┘
│
▼
┌────────────────┐ POST /api/v1/generate
│ Submit Job │──────────────────────────┐
│ (PixClient) │ │
└───────┬────────┘ │
│ ▼
│ ┌────────────────────┐
│ │ pixcli API │
│ │ (Cloudflare │
│ │ Workers) │
│ │ │
│ │ • Classify prompt │
│ │ • Route to model │
│ │ • Queue job │
│ └────────────────────┘
│
▼
┌────────────────┐ GET /api/v1/jobs/:id
│ Poll Job │ (every 2s, up to 10 min)
│ (poller.ts) │
└───────┬────────┘
│
│ status: completed
▼
┌────────────────┐ GET /api/v1/jobs/:id/result
│ Get Result │ → asset URLs
└───────┬────────┘
│
▼
┌────────────────┐
│ Download │ Stream assets → local files
│ (download.ts) │
└───────┬────────┘
│
▼
┌────────────────┐
│ Output │ Human: spinner + file paths
│ (output.ts) │ --json: structured JSON to stdout
└────────────────┘Command lifecycle (all commands follow the same pattern)
1. Autoupgrade → Check for new version (cached hourly)
2. Parse args → Commander validates flags, resolves API key
3. Upload → If --from / -i provided, upload local files to get URLs
4. Submit → POST to API, receive job_id
5. Poll → GET job status every 2s until completed/failed
6. Download → Fetch result assets, write to disk
7. Output → Print paths (human) or JSON (--json)Auto-upgrade
The CLI checks the npm registry for a newer published version on each invocation, but at most once per hour (cached in /tmp/pixcli-upgrade-check.json). If a new version is found, it runs npm install -g pixcli@latest silently and re-executes the original command. Set PIXCLI_NO_AUTOUPGRADE=1 to disable.
Requirements
- Node.js 18+
METERKEY_API_KEYenvironment variable
Versioning
pixcli (npm) shares one unified version with the OpenClaw plugin
(@shellbot/openclaw-pixcli) and the pixcli agent skill on ClawHub — all
released in lockstep, starting at 3.0.0. So a given version is the same
feature set across the CLI, plugin, and skill.
License
MIT
