npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@mindfullabai/ai-vision-cli

v1.0.0

Published

AI-powered image and video analysis CLI via Google Gemini. Self-describing for AI agents via --ai flag.

Downloads

53

Readme

ai-vision-cli

AI-powered image and video analysis CLI via Google Gemini. Built as an AI-native CLI that any agent (Claude Code, etc.) can discover and use autonomously.

Why not an MCP?

MCP tools work, but they add ~300-500 tokens of overhead per call (JSON-RPC handshake, schema loading, response wrapping). This CLI outputs plain text to stdout, saving ~90% context when used by AI agents. It's also faster — no protocol negotiation, just a direct API call.

Install

npm install -g @mindfullabai/ai-vision-cli

Setup

1. Get a Google AI Studio API key (free)

Go to aistudio.google.com/apikey and create a key.

export GOOGLE_API_KEY="your-key-here"
# Add to ~/.zshrc or ~/.bashrc to persist

2. Setup Claude Code integration (optional)

ai-vision setup-claude

This command:

  • Installs the skill in ~/.claude/skills/ai-vision-cli/
  • Adds Bash(ai-vision:*) permission to Claude Code settings
  • Verifies your API key

To remove: ai-vision setup-claude --uninstall

Usage

Analyze an image

ai-vision analyze-image ./screenshot.png --prompt "Describe this UI"
ai-vision ai ./photo.jpg --prompt "What objects are in this image?"

Analyze a video

ai-vision analyze-video ./demo.mp4 --prompt "Summarize this video"
ai-vision av "https://youtube.com/watch?v=xyz" --prompt "What topics are covered?"
ai-vision av ./recording.mp4 --prompt "What happens?" --start 1m30s --end 3m

Detect objects

ai-vision detect-objects ./page.png --prompt "Find all interactive elements"
ai-vision do ./screenshot.png --prompt "Find buttons" --output ./annotated.png
ai-vision do ./ui.png --prompt "Locate the search bar" --json

Returns a text summary with element positions + saves an annotated image with bounding boxes. Web-aware: auto-detects HTML elements on webpage screenshots.

Compare images

ai-vision compare before.png after.png --prompt "What changed?"
ai-vision cmp v1.png v2.png v3.png --prompt "How did the design evolve?"

Common options

| Option | Description | Default | |--------|-------------|---------| | --prompt | What to analyze (required) | - | | --model | Gemini model | gemini-2.5-flash-lite (image), gemini-2.5-flash (video) | | --max-tokens | Max output tokens | 1000 (image), 2000 (video) | | --temperature | Response randomness (0.0-2.0) | 0.8 | | --json | Output JSON with metadata | false |

AI Agent Integration

This CLI is designed to be self-describing for AI agents. Any agent that can run shell commands can discover and use it:

# Agent runs --help, sees the hint:
ai-vision --help
# → "AI Agent? Run: ai-vision --ai"

# Agent gets full JSON schema:
ai-vision --ai

# Or just a brief overview:
ai-vision --ai brief

# Or usage examples:
ai-vision --ai examples

# Or schema for a specific command:
ai-vision --ai analyze-image

The --ai flag outputs structured JSON with:

  • Command names, descriptions, and when to use each one
  • Full parameter schemas (types, required, defaults)
  • Concrete examples
  • No API key required

The --ai pattern

This CLI implements a pattern we call AI-native CLIs: tools that describe themselves in a machine-readable format via a simple --ai flag. Any CLI can adopt this pattern to become instantly usable by AI agents without external documentation.

The discovery flow for an AI agent:

  1. tool --help → sees "AI Agent? Run with --ai"
  2. tool --ai brief → quick overview (few tokens)
  3. tool --ai → full JSON schema with when_to_use, parameters, examples
  4. Agent now knows exactly when and how to use the tool

How it works

Under the hood, ai-vision calls Google Gemini models via the @google/genai SDK:

  • Image analysis: gemini-2.5-flash-lite (fast, cheap)
  • Video analysis: gemini-2.5-flash (handles temporal understanding)
  • Object detection: Gemini with structured JSON output + imagescript for annotation rendering
  • Thinking disabled: thinkingBudget: 0 for flash models to avoid wasting tokens

Supports local files, URLs, base64 data URLs, and YouTube URLs (video).

License

MIT