npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

agent-media

v0.3.2

Published

Agent-first media toolkit CLI

Readme

agent-media

Media processing CLI for AI agents.

  • Image: generate, edit, remove-background, resize, convert, extend
  • Video: extract audio
  • Audio: transcribe (with speaker identification)

Quick Start

Requires an API key from one of these providers:

# Generate an image
npx agent-media image generate --prompt "a robot painting a sunset"

# Edit the generated image
npx agent-media image edit --in .agent-media/generated_*.png --prompt "add a cat watching"

# Remove background
npx agent-media image remove-background --in .agent-media/edited_*.png

# Convert to webp
npx agent-media image convert --in .agent-media/nobg_*.png --format webp

Video to transcript (no API key needed for extract)

# Extract audio from video (local, no API key)
npx agent-media audio extract --in video.mp4

# Transcribe with speaker identification
npx agent-media audio transcribe --in .agent-media/extracted_*.mp3 --diarize

Local processing (no API key needed)

npx agent-media image resize --in photo.jpg --width 800
npx agent-media image convert --in photo.png --format webp
npx agent-media image extend --in photo.jpg --padding 50 --color "#FFFFFF"

Installation

# Use directly with npx (no install)
npx agent-media --help

# Or install globally
npm install -g agent-media

From Source

git clone https://github.com/TimPietrusky/agent-media
cd agent-media
pnpm install && pnpm build && pnpm link --global

Requirements

  • Node.js >= 18.0.0
  • API key for AI features (generate, edit, remove-background, transcribe)

Commands

Image Commands

agent-media image resize --in <path> [options]      # Resize image
agent-media image convert --in <path> --format <f>  # Convert format
agent-media image remove-background --in <path>     # Remove background
agent-media image generate --prompt <text>          # Generate from prompt
agent-media image extend --in <path> --padding <px> --color <hex>  # Extend canvas
agent-media image edit --in <path> --prompt <text>  # Edit with prompt

Audio Commands

agent-media audio extract --in <video>              # Extract audio from video
agent-media audio transcribe --in <audio>           # Transcribe audio to text

resize

agent-media image resize --in photo.jpg --width 800
agent-media image resize --in photo.jpg --height 600
agent-media image resize --in photo.jpg --width 800 --height 600

| Option | Description | |--------|-------------| | --in <path> | Input file path or URL (required) | | --width <px> | Target width in pixels | | --height <px> | Target height in pixels | | --out <dir> | Output directory | | --provider <name> | Provider (local) |

convert

agent-media image convert --in photo.png --format webp
agent-media image convert --in photo.jpg --format png
agent-media image convert --in photo.png --format jpg --quality 90

| Option | Description | |--------|-------------| | --in <path> | Input file path or URL (required) | | --format <f> | Output format: png, jpg, webp (required) | | --quality <n> | Quality 1-100 for lossy formats (default: 80) | | --out <dir> | Output directory | | --provider <name> | Provider (local) |

remove-background

agent-media image remove-background --in portrait.jpg
agent-media image remove-background --in https://example.com/photo.jpg

| Option | Description | |--------|-------------| | --in <path> | Input file path or URL (required) | | --out <dir> | Output directory | | --provider <name> | Provider (fal, replicate) |

generate

agent-media image generate --prompt "a cat wearing a hat"
agent-media image generate --prompt "sunset over mountains" --width 1024 --height 768

| Option | Description | |--------|-------------| | --prompt <text> | Text description (required) | | --width <px> | Width (default: 1024) | | --height <px> | Height (default: 1024) | | --out <dir> | Output directory | | --provider <name> | Provider (fal, replicate, runpod) | | --model <name> | Model override (e.g., fal-ai/flux-2, black-forest-labs/flux-2-dev) |

extend

Extend image canvas by adding padding on all sides with a solid background color.

agent-media image extend --in photo.jpg --padding 50 --color "#E4ECF8"
agent-media image extend --in photo.png --padding 100 --color "#FFFFFF" --dpi 300

| Option | Description | |--------|-------------| | --in <path> | Input file path or URL (required) | | --padding <px> | Padding size in pixels to add on all sides (required) | | --color <hex> | Background color for extended area (required). Also flattens transparency. | | --dpi <n> | DPI/density for output image (default: 300) | | --out <dir> | Output directory | | --provider <name> | Provider (local) |

edit

Edit an image using a text prompt (image-to-image).

agent-media image edit --in photo.jpg --prompt "make the sky more vibrant"
agent-media image edit --in portrait.jpg --prompt "add sunglasses"

| Option | Description | |--------|-------------| | --in <path> | Input file path or URL (required) | | --prompt <text> | Text description of the desired edit (required) | | --out <dir> | Output directory | | --provider <name> | Provider (fal, replicate, runpod) | | --model <name> | Model override (e.g., fal-ai/flux-2/edit) |

audio extract

Extract audio track from a video file. Uses local ffmpeg, no API key needed.

agent-media audio extract --in video.mp4
agent-media audio extract --in video.mp4 --format wav

| Option | Description | |--------|-------------| | --in <path> | Input video file path or URL (required) | | --format <f> | Output format: mp3, wav (default: mp3) | | --out <dir> | Output directory |

audio transcribe

Transcribe audio to text with timestamps. Supports speaker identification.

agent-media audio transcribe --in audio.mp3
agent-media audio transcribe --in audio.mp3 --diarize --speakers 2

| Option | Description | |--------|-------------| | --in <path> | Input audio file path or URL (required) | | --diarize | Enable speaker identification | | --language <code> | Language code (auto-detected if not provided) | | --speakers <n> | Number of speakers hint | | --out <dir> | Output directory | | --provider <name> | Provider (fal, replicate) | | --model <name> | Model override |

Output Format

All commands return JSON to stdout:

{
  "ok": true,
  "media_type": "image",
  "action": "resize",
  "provider": "local",
  "output_path": ".agent-media/resized_123_abc.png",
  "mime": "image/png",
  "bytes": 45678
}

On error:

{
  "ok": false,
  "error": {
    "code": "INVALID_INPUT",
    "message": "At least one of width or height must be specified"
  }
}

Exit code is 0 on success, 1 on error.

Providers

Default Models

| Provider | resize | convert | extend | generate | edit | remove-background | transcribe | |----------|--------|---------|--------|----------|------|-------------------|------------| | local | ✓ | ✓ | ✓ | - | - | - | - | | fal | - | - | - | fal-ai/flux-2 | fal-ai/flux-2/edit | fal-ai/birefnet/v2 | fal-ai/wizper | | replicate | - | - | - | black-forest-labs/flux-2-dev | black-forest-labs/flux-kontext-dev | men1scus/birefnet | WhisperX | | runpod | - | - | - | alibaba/wan-2.6 | google/nano-banana-pro-edit | - | - |

Use --model <name> to override the default model for any command.

Provider Selection

  1. Explicit flag (highest priority): --provider fal
  2. Environment auto-detect: Set FAL_API_KEY to auto-select fal
  3. Fallback to local: For resize/convert when no provider specified
  4. First supporting provider: For generate/remove-background

Environment Variables

| Variable | Description | Get Key | |----------|-------------|---------| | FAL_API_KEY | fal.ai API key | fal.ai | | REPLICATE_API_TOKEN | Replicate API token | replicate.com | | RUNPOD_API_KEY | Runpod API key | runpod.io | | HUGGINGFACE_ACCESS_TOKEN | For transcription with speaker ID (replicate only) | huggingface.co | | AGENT_MEDIA_DIR | Output directory (default: .agent-media/) | - |

Usage with AI Agents

Just ask the agent

Use agent-media to resize this image to 800px wide.
Run agent-media --help to see available commands.

AGENTS.md / CLAUDE.md

Add to your project instructions:

## Media Processing

Use `agent-media` for image and audio operations. Run `agent-media --help` for commands.

- `agent-media image resize --in <path> --width <px>` - Resize image
- `agent-media image convert --in <path> --format <f>` - Convert format
- `agent-media image generate --prompt <text>` - Generate image
- `agent-media image edit --in <path> --prompt <text>` - Edit image
- `agent-media image remove-background --in <path>` - Remove background
- `agent-media audio extract --in <video>` - Extract audio from video
- `agent-media audio transcribe --in <audio>` - Transcribe audio

All commands output JSON with `ok: true/false` and exit 0/1.

Roadmap

  • [ ] Local CPU background removal via transformers.js/ONNX (zero API keys)
  • [ ] Video processing actions
  • [ ] Batch processing support