npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

agent-media

v0.14.0

Published

Agent-first media toolkit CLI

Downloads

622

Readme

agent-media

Media processing CLI for AI agents.

  • Image: generate, edit, remove-background, upscale, resize, convert, extend, crop
  • Video: generate (text-to-video and image-to-video)
  • Audio: extract from video, transcribe (with speaker identification)

Installation

Global

npm install -g agent-media@latest

From Source

git clone https://github.com/agntswrm/agent-media
cd agent-media
pnpm install && pnpm build && pnpm link --global

Via bunx / npx

Run directly without installing:

bunx agent-media@latest --help
npx agent-media@latest --help

Skills for AI Agents

Install agent-media skills to your coding agent (Claude Code, Cursor, Codex, etc.):

npx skills add agntswrm/agent-media

This adds media processing skills that your AI agent can use automatically. Available skills:

  • agent-media - Overview of all capabilities
  • image-generate - Generate images from text
  • image-edit - Edit images with text prompts
  • image-resize - Resize images
  • image-convert - Convert image formats
  • image-extend - Extend image canvas with padding
  • image-remove-background - Remove backgrounds
  • image-crop - Crop images to specified dimensions
  • image-upscale - Upscale images with AI super-resolution
  • audio-extract - Extract audio from video
  • audio-transcribe - Transcribe audio to text
  • video-generate - Generate videos from text or images

Quick Start

# generate an image
agent-media image generate --prompt "a robot" --out rob.png

# remove background
agent-media image remove-background --in rob.png --out rob_nobg.png

# edit the image
agent-media image edit --in rob_nobg.png --prompt "the robot is sitting on a bench next to a cat, in the background you can see the Eiffel Tower in Paris" --out rob_cat_paris.png

# generate a video with audio (cat meows, robot speaks!)
agent-media video generate --in rob_cat_paris.png --prompt "the cat meows and the robot says: \"Yes, me too.\"" --audio --out rob_cat_video.mp4

# extract audio from video
agent-media audio extract --in rob_cat_video.mp4 --out rob_cat_audio.mp3

# transcribe the audio
agent-media audio transcribe --in rob_cat_audio.mp3

Requirements

Local processing (no API key): resize, convert, extend, crop, upscale, audio extract, remove-background, transcribe

Cloud processing (API key required): image generate, image edit, upscale, video generate, remove-background, transcribe

Note: You may see a mutex lock failed error when using local remove-background, upscale, or transcribe — ignore it, the output is correct if JSON shows "ok": true.


image

agent-media image resize --in <path> [options]
agent-media image convert --in <path> --format <f>
agent-media image extend --in <path> --padding <px> --color <hex>
agent-media image crop --in <path> --width <px> --height <px>
agent-media image generate --prompt <text>
agent-media image edit --in <paths...> --prompt <text>
agent-media image remove-background --in <path>
agent-media image upscale --in <path>

resize

local

agent-media image resize --in sunset-mountains.jpg --width 800
agent-media image resize --in sunset-mountains.jpg --height 600
agent-media image resize --in https://ytrzap04kkm0giml.public.blob.vercel-storage.com/sunset-mountains.jpg --width 800

| Option | Description | |--------|-------------| | --in <path> | Input file path or URL (required) | | --width <px> | Target width in pixels | | --height <px> | Target height in pixels | | --out <path> | Output path, filename or directory (default: ./) |

convert

local

agent-media image convert --in sunset-mountains.png --format webp
agent-media image convert --in sunset-mountains.jpg --format png
agent-media image convert --in https://ytrzap04kkm0giml.public.blob.vercel-storage.com/sunset-mountains.png --format jpg --quality 90

| Option | Description | |--------|-------------| | --in <path> | Input file path or URL (required) | | --format <f> | Output format: png, jpg, webp (required) | | --quality <n> | Quality 1-100 for lossy formats (default: 80) | | --out <path> | Output path, filename or directory (default: ./) |

extend

local

Extend image canvas by adding padding on all sides with a solid background color.

agent-media image extend --in sunset-mountains.jpg --padding 50 --color "#E4ECF8"
agent-media image extend --in https://ytrzap04kkm0giml.public.blob.vercel-storage.com/sunset-mountains.png --padding 100 --color "#FFFFFF"

| Option | Description | |--------|-------------| | --in <path> | Input file path or URL (required) | | --padding <px> | Padding size in pixels to add on all sides (required) | | --color <hex> | Background color for extended area (required). Also flattens transparency. | | --dpi <n> | DPI/density for output image (default: 300) | | --out <path> | Output path, filename or directory (default: ./) |

crop

local

Crop an image to specified dimensions around a focal point. The crop region is calculated to center on the focal point while staying within image bounds.

agent-media image crop --in sunset-mountains.jpg --width 800 --height 600
agent-media image crop --in sunset-mountains.jpg --width 800 --height 600 --focus-x 20 --focus-y 30
agent-media image crop --in https://ytrzap04kkm0giml.public.blob.vercel-storage.com/sunset-mountains.jpg --width 400 --height 400

| Option | Description | |--------|-------------| | --in <path> | Input file path or URL (required) | | --width <px> | Width of crop area in pixels (required) | | --height <px> | Height of crop area in pixels (required) | | --focus-x <n> | Focal point X position 0-100 (default: 50 = center) | | --focus-y <n> | Focal point Y position 0-100 (default: 50 = center) | | --dpi <n> | DPI/density for output image (default: 300) | | --out <path> | Output path, filename or directory (default: ./) |

generate

API key required

agent-media image generate --prompt "a cat wearing a hat"
agent-media image generate --prompt "sunset over mountains" --width 1024 --height 768

| Option | Description | |--------|-------------| | --prompt <text> | Text description (required) | | --width <px> | Width (default: 1280) | | --height <px> | Height (default: 720) | | --out <path> | Output path, filename or directory (default: ./) | | --provider <name> | Provider (fal, replicate, runpod, ai-gateway) | | --model <name> | Model override (e.g., fal-ai/flux-2, bfl/flux-2-pro) |

edit

API key required

Edit one or more images using a text prompt (image-to-image). Supports multiple input images for combining styles, subjects, or scenes.

agent-media image edit --in sunset-mountains.jpg --prompt "make the sky more vibrant"
agent-media image edit --in https://ytrzap04kkm0giml.public.blob.vercel-storage.com/man-portrait.png --prompt "add sunglasses"
agent-media image edit --in style.jpg content.jpg --prompt "apply the style of the first image to the second"

| Option | Description | |--------|-------------| | --in <paths...> | One or more input file paths or URLs (required) | | --prompt <text> | Text description of the desired edit (required) | | --out <path> | Output path, filename or directory (default: ./) | | --provider <name> | Provider (fal, replicate, runpod, ai-gateway) | | --model <name> | Model override (e.g., fal-ai/flux-2/edit, google/gemini-3-pro-image) |

remove-background

local or cloud

agent-media image remove-background --in man-portrait.png
agent-media image remove-background --in https://ytrzap04kkm0giml.public.blob.vercel-storage.com/man-portrait.png

| Option | Description | |--------|-------------| | --in <path> | Input file path or URL (required) | | --out <path> | Output path, filename or directory (default: ./) | | --provider <name> | Provider (local, fal, replicate) |

upscale

local or cloud

Upscale an image using AI super-resolution to increase resolution with detail generation.

agent-media image upscale --in sunset-mountains.jpg
agent-media image upscale --in sunset-mountains.jpg --scale 4 --provider fal
agent-media image upscale --in https://ytrzap04kkm0giml.public.blob.vercel-storage.com/sunset-mountains.jpg --provider replicate

| Option | Description | |--------|-------------| | --in <path> | Input file path or URL (required) | | --scale <n> | Scale factor: 2 or 4 (default: 2). Local provider always outputs 4x. | | --out <path> | Output path, filename or directory (default: ./) | | --provider <name> | Provider (local, fal, replicate) | | --model <name> | Model override |


video

# Generate video from text
agent-media video generate --prompt <text>

# Generate video from image (animate an image)
agent-media video generate --in <image> --prompt <text>

generate

API key required

Generate video from a text prompt. Optionally provide an input image to animate it (image-to-video). The prompt describes what should happen in the video.

# Text-to-video
agent-media video generate --prompt "a cat walking through a garden"

# Image-to-video (animate an image)
agent-media video generate --in woman-portrait.png --prompt "person smiles and waves hello"

# With audio/speech generation (runpod)
agent-media video generate --in woman-portrait.png --prompt "The woman says: \"Hello, welcome to our channel!\"" --audio --provider runpod

# With ambient audio (fal)
agent-media video generate --prompt "fireworks in the night sky" --audio --duration 10 --provider fal

# Higher resolution
agent-media video generate --prompt "ocean waves" --resolution 1080p

| Option | Description | |--------|-------------| | --prompt <text> | Text description of the video (required) | | --in <path> | Input image for image-to-video (optional) | | --duration <s> | Duration in seconds (default: 5 for runpod, 6 for others) | | --resolution <r> | Resolution: 720p, 1080p (default: 720p) | | --fps <n> | Frame rate: 25, 50 (default: 25) | | --audio | Generate audio track (includes speech from quoted text with runpod) | | --out <path> | Output path, filename or directory (default: ./) | | --provider <name> | Provider (fal, replicate, runpod) | | --model <name> | Model override |


audio

# Extract audio from video
agent-media audio extract --in <video>

# Transcribe audio to text
agent-media audio transcribe --in <audio>

extract

local

Extract audio track from a video file.

agent-media audio extract --in woman-greeting.mp4
agent-media audio extract --in woman-greeting.mp4 --format wav
agent-media audio extract --in https://ytrzap04kkm0giml.public.blob.vercel-storage.com/woman-greeting.mp4

| Option | Description | |--------|-------------| | --in <path> | Input video file path or URL (required) | | --format <f> | Output format: mp3, wav (default: mp3) | | --out <path> | Output path, filename or directory (default: ./) |

transcribe

local or cloud (diarization requires cloud)

Transcribe audio to text with timestamps. Speaker identification (diarization) requires a cloud provider.

agent-media audio transcribe --in woman-greeting.mp3
agent-media audio transcribe --in woman-greeting.mp3 --diarize --speakers 2
agent-media audio transcribe --in https://ytrzap04kkm0giml.public.blob.vercel-storage.com/woman-greeting.mp3

| Option | Description | |--------|-------------| | --in <path> | Input audio file path or URL (required) | | --diarize | Enable speaker identification (cloud only) | | --language <code> | Language code (auto-detected if not provided) | | --speakers <n> | Number of speakers hint | | --out <path> | Output path, filename or directory (default: ./) | | --provider <name> | Provider (local, fal, replicate) | | --model <name> | Model override |


Output Format

All commands return JSON to stdout:

{
  "ok": true,
  "media_type": "image",
  "action": "resize",
  "provider": "local",
  "output_path": "resized_123_abc.png",
  "mime": "image/png",
  "bytes": 45678
}

On error:

{
  "ok": false,
  "error": {
    "code": "INVALID_INPUT",
    "message": "At least one of width or height must be specified"
  }
}

Exit code is 0 on success, 1 on error.

Providers

Default Models

| Provider | resize | convert | extend | crop | image generate | image edit | remove-background | upscale | video generate | transcribe | |----------|--------|---------|--------|------|----------------|------------|-------------------|---------|----------------|------------| | local | ✓* | ✓* | ✓* | ✓* | - | - | Xenova/modnet** | Xenova/swin2SR** | - | moonshine-base** | | fal | - | - | - | - | fal-ai/flux-2 | fal-ai/flux-2/edit | fal-ai/birefnet/v2 | fal-ai/esrgan | fal-ai/ltx-2 | fal-ai/wizper | | replicate | - | - | - | - | black-forest-labs/flux-2-dev | black-forest-labs/flux-kontext-dev | men1scus/birefnet | nightmareai/real-esrgan | lightricks/ltx-video | whisper-diarization | | runpod | - | - | - | - | alibaba/wan-2.6 | google/nano-banana-pro-edit | - | - | wan-2.6 | - | | ai-gateway | - | - | - | - | bfl/flux-2-pro | google/gemini-3-pro-image | - | - | - | - |

* Powered by Sharp for fast image processing ** Powered by Transformers.js for local ML inference (models downloaded on first use)

Use --model <name> to override the default model for any command.

Provider Selection

  1. Explicit flag (highest priority): --provider fal
  2. Environment auto-detect: Set FAL_API_KEY, REPLICATE_API_TOKEN, RUNPOD_API_KEY, or AI_GATEWAY_API_KEY to auto-select that provider
  3. Fallback to local: For resize/convert when no provider specified
  4. First supporting provider: For generate/remove-background

Environment Variables

| Variable | Description | Get Key | |----------|-------------|---------| | FAL_API_KEY | fal.ai API key | fal.ai | | REPLICATE_API_TOKEN | Replicate API token | replicate.com | | RUNPOD_API_KEY | Runpod API key | runpod.io | | AI_GATEWAY_API_KEY | AI Gateway API key | vercel.com | | AGENT_MEDIA_DIR | Output directory (default: current directory) | - |

Roadmap

  • [x] Local background removal (zero API keys)
  • [x] Local transcription (zero API keys)
  • [x] Video generation (text-to-video and image-to-video)
  • [ ] Batch processing support