npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

media2md

v0.2.2

Published

Convert images to structured markdown with AI-generated descriptions and extracted text

Downloads

505

Readme

m2md

Turn images into structured, searchable markdown with AI vision.

I have 20 years of visual references (design inspiration, screenshots, diagrams, mood boards) scattered across folders. I use Obsidian for note-taking and wanted my vault to work for media too, not just text. The problem: images are invisible to search, to AI context windows, to grep. You can't find "that minimalist Japanese packaging with the kraft paper texture" in a folder of PNGs.

m2md fixes that. One command turns any image into a structured .md sidecar with AI-generated descriptions, extracted text, typed metadata, and tags. Ready for full-text search, vector search, and LLM retrieval.

npm install -g media2md

Quick start

# Describe a screenshot → writes screenshot.md next to it
m2md screenshot.png

# Batch an entire directory
m2md ./assets/

# Print to stdout (pipe to clipboard, another tool, etc.)
m2md screenshot.png --stdout | pbcopy

# Describe an image from a URL
m2md https://example.com/photo.png

Features

  • AI-powered image descriptions via Claude or OpenAI vision
  • Text extraction (OCR) from screenshots, documents, diagrams
  • YAML frontmatter with 25+ structured fields (type, style, mood, era, typography, palette, references, etc.)
  • Sidecar .md files next to images — makes directories greppable
  • Provider tiers — --tier fast for cheap/quick, --tier quality for best results
  • URL support — pass image URLs directly, or screenshot web pages via Playwright
  • Watch mode — auto-process new/changed images in a directory
  • Custom instructions (--prompt) and focus directives (--note)
  • 4 built-in templates (default, minimal, alt-text, detailed) plus custom templates
  • Content-hash caching — skip unchanged files automatically
  • Cost estimation before processing (--estimate, --dry-run)
  • Batch processing with concurrency control
  • MCP server for AI agent integration (Claude Desktop, etc.)
  • Programmatic API for use as a library

How the analysis works

m2md doesn't just ask an AI to "describe this image." It sends a detailed structured prompt that produces consistent, searchable metadata across every image you process. Here's what that means in practice.

Structured schema (25+ fields)

Every image produces the same set of fields, making your entire collection queryable:

| Field group | Fields | What they capture | |-------------|--------|-------------------| | Classification | type, category, medium | Visual form of the image (photo, illustration, painting, sketch, diagram, screenshot, render-3d, etc.) and what discipline it belongs to (ui-design, packaging, photography). Type describes how the image looks, not the subject — a photo of a logo is type "photo", not "logo". | | Aesthetics | style, mood, palette, composition | Visual treatment (minimalist, brutalist, mid-century), emotional register (calm, dramatic, serene), material-driven color names (kraft-brown, slate-blue, bone-white), and layout structure (rule-of-thirds, grid, layered) | | Archival context | era, artifact, typography, script, cultural_influence | Time period evoked (1970s, contemporary), designed object depicted (poster, packaging-box, website), typeface details (futura, sans-serif, letterpress), writing systems (latin, kanji, hangul), and aesthetic lineage (scandinavian-functionalism, japanese-wabi-sabi) | | Discovery | tags, references, search_phrases, dimensions | Searchable keywords, design movement references (Bauhaus, Dieter Rams), natural language search phrases, and analytical axes explaining why the image is reference-worthy | | Content | subject, description, extracted_text, visual_elements, use_case, color_hex | One-line summary, 4-sentence structured description, OCR text, literal visible objects, designer use cases, and sampled hex colors |

Quality constraints

The prompt enforces specific rules so output stays useful at scale:

  • Tags are capped at 6-8 and drawn from a seed vocabulary of 81 canonical terms across 6 groups (materials, techniques, finishes, effects, photography, production). Formation rules enforce consistency: always hyphenated, always singular, specific over generic, material-first compounds. The model can invent new tags following the same rules.
  • Style and mood cannot share terms — style is visual treatment, mood is emotional register. No "bold" in both.
  • Palette uses evocative names only — never generic "white" or "black." Always material-driven: bone-white, chalk-white, ink-black, obsidian-black.
  • Search phrases are capped at 8-10 and must be meaningfully distinct (different angles: literal, conceptual, stylistic, use-case).
  • References push for 3-5 specific entries — art-historical context, named designers, cultural movements. "none" only for purely abstract content.
  • Dimensions must be non-overlapping — each axis illuminates a genuinely distinct analytical lens.
  • Subject lines attribute colors correctly to objects — "cognac leather sofa, mustard side table" not the reverse. When the 80-character limit forces compression, the model drops the color rather than misattributing it.

Controlled vocabulary

The schema uses a three-tier vocabulary system:

Tier 1 — Strict enums (type, category): closed vocabulary, auto-corrected in code. Unknown values are replaced with the nearest match or "other" with a warning. Categories cover 29 creative disciplines from branding and ui-design to ceramics, textile-design, and street-art.

Tier 2 — Normalized tags with seed vocabulary (tags): 81 canonical terms the model prefers, plus formation rules that prevent fragmentation. The model can extend the vocabulary following the same rules (hyphenated, singular, material-first). Suggested fields (style, mood, medium, composition) work similarly — 100+ suggested terms, extensible when needed.

Tier 3 — Freeform (dimensions, search_phrases, description): unconstrained fields that capture what controlled vocabularies can't.

You can extend any vocabulary per-project via config:

{
  "taxonomy": {
    "styles": ["sneaker-culture", "streetwear"],
    "categories": ["sneaker-design"],
    "tags": ["sneaker-silhouette", "midsole-tooling"]
  }
}

Usage

Single file

m2md screenshot.png                     # writes screenshot.md next to it
m2md screenshot.png -o ./docs/          # writes to docs/screenshot.md
m2md screenshot.png --stdout            # print to stdout

Batch

m2md ./assets/                          # .md sidecar next to each image
m2md ./assets/ -r                       # recursive
m2md ./assets/ -r -o ./docs/            # recursive, custom output dir
m2md ./assets/*.png                     # glob

Tiers

Quick presets instead of picking provider + model:

m2md screenshot.png --tier fast         # gpt-4o-mini — quick + cheap
m2md screenshot.png --tier quality      # claude-sonnet — best results (default behavior)

| Tier | Provider | Model | Best for | |------|----------|-------|----------| | fast | OpenAI | gpt-4o-mini | Quick passes, large batches, drafts | | quality | Anthropic | claude-sonnet | Final output, detailed descriptions |

Explicit --provider / --model flags always override --tier.

Providers

By default m2md uses Anthropic's Claude. Switch to OpenAI with --provider:

m2md screenshot.png                             # Anthropic Claude (default)
m2md screenshot.png --provider openai           # OpenAI GPT-4o
m2md screenshot.png --provider openai -m gpt-4o-mini  # specific model

URLs

Pass image URLs directly — m2md downloads and processes them:

m2md https://example.com/screenshot.png                # image URL → download + describe
m2md https://example.com/landing-page                  # non-image URL → screenshot via Playwright
m2md screenshot.png https://example.com/photo.jpg      # mix local files + URLs

For non-image URLs, m2md takes a full-page screenshot using Playwright (optional dependency):

npm install playwright
npx playwright install chromium   # downloads the Chromium browser binary

Watch mode

Auto-process new and changed images in a directory:

m2md watch ./assets/                    # watch for new/changed images
m2md watch ./assets/ --tier fast        # watch with fast tier
m2md watch ./assets/ -o ./docs/         # custom output directory
m2md watch ./assets/ -p "List all product names"  # watch with custom instructions

On startup, existing images without a .md sidecar are processed. Then m2md watches for new/changed files and processes them automatically. Press Ctrl+C to stop.

Compare mode

Compare two or more images in a single API call:

m2md compare before.png after.png                # compare two images
m2md compare v1.png v2.png v3.png                 # compare multiple versions
m2md compare a.png b.png -n "focus on typography" # with focus directive
m2md compare a.png b.png -o comparison.md         # write to file

Outputs structured markdown with Summary, Similarities, Differences, and Verdict sections. Images are labeled A, B, C, etc.

Custom instructions (--prompt)

Tell the model what to do differently. Instructions are appended to the built-in system prompt, so you get the structured output format plus your custom behavior:

m2md screenshot.png -p "List all visible product names and prices"
m2md ./assets/ -p "Identify every UI component and its state"
m2md photo.jpg -p "Describe from a security auditor's perspective"

Focus directives (--note)

A lightweight nudge that's additive — it tells the model to pay extra attention to something without changing the analysis instructions. Combine with --prompt or use on its own:

m2md refs/*.png -n "watercolor technique, color palette, brushwork"
m2md hero.png -n "dark mode, spacing tokens"
m2md ./illos/ -p "art critique" -n "line weight and hatching"

Templates

m2md screenshot.png                        # default (frontmatter + description + text)
m2md screenshot.png --template minimal     # description + source link
m2md screenshot.png --template alt-text    # just a description string
m2md screenshot.png --template detailed    # full metadata table + image embed
m2md screenshot.png --template ./my.md     # custom template file
m2md screenshot.png --no-frontmatter       # strip YAML frontmatter from output

Template variables available in custom templates:

| Variable | Description | |----------|-------------| | {{type}} | Visual form (photo, illustration, painting, sketch, diagram, chart, screenshot, render-3d, etc.) | | {{category}} | Content category (ui-design, photography, packaging-design, etc.) | | {{style}} | Visual style (minimalist, brutalist, mid-century, bauhaus, wabi-sabi, etc.) | | {{mood}} | Mood/tone (calm, energetic, serene, dramatic, etc.) | | {{medium}} | Medium (screen-capture, product-photography, letterpress, risograph, etc.) | | {{composition}} | Composition (centered, grid, rule-of-thirds, golden-ratio, etc.) | | {{palette}} | Material-driven color names (kraft-brown, slate-blue, bone-white, etc.) | | {{subject}} | One-line summary (max 80 chars) | | {{description}} | 4-sentence structured description | | {{extractedText}} | All visible text, grouped by context | | {{colors}} | Dominant colors (same as palette) | | {{tags}} | 6-8 searchable keywords (materials, techniques, proper nouns) | | {{visualElements}} | Literal visible objects (5-15 items) | | {{references}} | Design movements, named styles, artist/designer references | | {{useCase}} | Designer reference use cases | | {{colorHex}} / {{colorHexYaml}} | 3-5 hex color values sampled from the image | | {{era}} | Time period the design evokes (mid-century, 1970s, contemporary, etc.) | | {{artifact}} | Designed object type (poster, packaging-box, website, album-cover, etc.) | | {{typography}} | Typeface names, classifications, techniques | | {{script}} | Writing systems / languages visible (latin, kanji, hangul, etc.) | | {{culturalInfluence}} | Aesthetic lineages (japanese-wabi-sabi, scandinavian-functionalism, etc.) | | {{searchPhrases}} / {{searchPhrasesYaml}} | 8-10 natural language search phrases | | {{dimensions}} / {{dimensionsYaml}} | 2-5 reference-worthiness axes | | {{filename}} | Original filename | | {{basename}} | Filename without extension | | {{format}} | File format (PNG, JPEG, WebP, GIF) | | {{dimensionsPx}} | Width x Height | | {{width}} / {{height}} | Image dimensions | | {{sizeHuman}} / {{sizeBytes}} | File size | | {{sha256}} | Content hash | | {{processedDate}} / {{datetime}} | Processing timestamp | | {{model}} | AI model used | | {{note}} | Focus directive | | {{sourcePath}} | Relative path to source file |

Caching

Results are cached by content hash. Re-running on a directory only processes new or changed files.

m2md ./assets/                # second run skips unchanged files
m2md ./assets/ --no-cache     # force re-processing
m2md cache status             # show cache stats
m2md cache clear              # clear all cached results

Cache location (in order of precedence):

  1. M2MD_CACHE_DIR environment variable
  2. $XDG_CACHE_HOME/m2md (if XDG_CACHE_HOME is set)
  3. ~/.cache/m2md (default)

Cost estimation

Preview what a batch will cost before calling the API:

m2md ./assets/ --estimate     # show token/cost estimate
m2md ./assets/ --dry-run      # list files with cache status + estimates

Both work without an API key so you can preview before committing.

Custom filenames

Control the output .md filename with --name:

m2md screenshot.png --name "{date}-{filename}"        # 2026-02-18-screenshot.md
m2md ./assets/ --name "{type}-{filename}"             # screenshot-hero.md, photo-team.md
m2md shot.png --name "{date}-{subject}"               # 2026-02-18-dashboard-with-charts.md

| Placeholder | Description | |-------------|-------------| | {filename} | Original filename without extension | | {date} | Processing date (YYYY-MM-DD) | | {type} | AI-detected image type (screenshot, photo, diagram, etc.) | | {subject} | AI-generated subject line, slugified |

Other flags

m2md screenshot.png -m claude-sonnet-4-5-20250929   # specific model
m2md ./assets/ --concurrency 10                      # parallel API calls (default: 5)
m2md screenshot.png --no-frontmatter                   # strip YAML frontmatter
m2md screenshot.png -v                               # verbose output (tokens, cost, timing)

Configuration

Drop a config file in any directory to set defaults for that project. This way you don't have to pass the same flags every time — just run m2md ./assets/ and it picks up your settings.

Create an m2md.config.json (or .m2mdrc, .m2mdrc.json, .m2mdrc.yaml) in your project root:

{
  "tier": "quality",
  "note": "focus on typography, color palette, layout grid",
  "output": "./docs",
  "recursive": true
}

All options are optional — only set what you want to override:

| Key | What it does | Default | |-----|-------------|---------| | provider | AI provider (anthropic, openai) | anthropic | | model | AI model to use | provider default | | tier | Preset tier (fast, quality) | none | | prompt | Custom instructions for the model | none | | note | Focus directive (additive nudge) | none | | template | Output template | default | | output | Output directory for .md files | next to image | | name | Output filename pattern ({filename}, {date}, {type}, {subject}) | none | | noFrontmatter | Strip YAML frontmatter from output | false | | recursive | Scan directories recursively | false | | cache | Cache results by content hash | true | | concurrency | Max parallel API calls | 5 |

Precedence: CLI flags > --tier > config file > defaults.

You can also put config under an "m2md" key in package.json.

Setup

# 1. Install
npm install -g media2md

# 2. Set your API key (one or both)
export ANTHROPIC_API_KEY="sk-ant-..."   # for Anthropic / --tier quality (default)
export OPENAI_API_KEY="sk-..."          # for OpenAI / --tier fast

# 3. Verify
m2md setup

MCP server

m2md includes an MCP server (m2md-mcp) that exposes a describe_image tool over stdio. This lets AI agents — like Claude Desktop or any MCP client — analyze images directly.

The server auto-detects API keys from your shell profile (~/.zshrc, ~/.bashrc, ~/.zprofile, ~/.bash_profile, ~/.profile), so you typically don't need to pass them explicitly. This is especially useful for GUI apps like Claude Desktop that don't inherit shell environment variables.

Claude Desktop / Claude Code

Add to your MCP config (claude_desktop_config.json or .claude/settings.json):

{
  "mcpServers": {
    "m2md": {
      "command": "m2md-mcp"
    }
  }
}

If API key auto-detection doesn't work (e.g. keys are in a secrets manager), pass them explicitly:

{
  "mcpServers": {
    "m2md": {
      "command": "m2md-mcp",
      "env": {
        "ANTHROPIC_API_KEY": "sk-ant-..."
      }
    }
  }
}

Tool: describe_image

| Parameter | Required | Description | |-----------|----------|-------------| | filePath | Yes | Absolute path to the image file | | provider | No | anthropic or openai | | model | No | AI model to use | | prompt | No | Custom instructions for the model | | note | No | Focus directive | | template | No | default, minimal, alt-text, detailed |

Returns the rendered markdown as text content. Shares the same cache as the CLI.

Supported formats

PNG, JPEG, WebP, GIF

Size limits per image:

| Provider | Max size | |----------|----------| | Anthropic | 5 MB | | OpenAI | 20 MB |

If both API keys are set, m2md automatically routes oversized files to the other provider. Otherwise, use --provider openai for larger files or resize the image first.

Programmatic API

import { processFile, AnthropicProvider } from "media2md";

const result = await processFile("screenshot.png", {
  provider: new AnthropicProvider(),
});

result.markdown;           // rendered markdown string
result.description;        // 4-sentence structured description
result.extractedText;      // extracted text
result.type;               // "screenshot", "photo", "diagram", etc.
result.category;           // "ui-design", "photography", etc.
result.style;              // "minimalist, brutalist"
result.mood;               // "calm, warm"
result.tags;               // comma-separated keywords
result.palette;            // material-driven color names
result.era;                // "mid-century, contemporary"
result.artifact;           // "poster", "website", etc.
result.typography;         // "futura, sans-serif"
result.script;             // "latin, english"
result.culturalInfluence;  // "scandinavian-functionalism"
result.references;         // "Bauhaus, Dieter Rams"
result.searchPhrases;      // newline-separated search phrases
result.metadata;           // { width, height, format, sizeHuman, sha256, ... }

License

MIT