npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@msalman5230/image-understand-mcp

v1.0.4

Published

Local MCP server that lets text-only agents understand local images through Gemini vision models.

Readme

Image Understand MCP Server

Local MCP server that lets an LLM agent without native vision understand local image files through Google Gemini/emma models.

The server runs over stdio and exposes image analysis tools for local image paths.

Tool

analyze_image

Use this for specific image analysis, OCR, object detection, accessibility descriptions, charts, screenshots, receipts, diagrams, and general questions about local image files.

Inputs:

  • image_path string, required. Local filesystem path only. Relative paths resolve from the MCP server working directory.
  • question string, optional. A specific question about the image.
  • mode string, optional. One of general, ocr, objects, or accessibility. Default: general.
  • detail string, optional. One of brief, normal, or detailed. Default: normal.

The tool returns human-readable text plus structured content:

{
  "backend": "gemini",
  "model": "gemini-3.5-flash",
  "image_path": "C:/path/to/image.png",
  "mime_type": "image/png",
  "size_bytes": 12345,
  "mode": "general",
  "detail": "normal",
  "prompt": "...",
  "analysis": "..."
}

Codex Config

Add this to ~/.codex/config.toml after publishing the package to npm:

[mcp_servers.image_understand]
command = "npx"
args = ["-y", "@msalman5230/image-understand-mcp"]
env = { GEMINI_API_KEY = "YOUR_KEY", GEMINI_MODEL = "gemini-3.5-flash" }

You can also keep the API key outside the config and let Codex inherit the environment:

[mcp_servers.image_understand]
command = "npx"
args = ["-y", "@msalman5230/image-understand-mcp"]
env = { GEMINI_MODEL = "gemini-3.5-flash" }

For local development before publishing, use the built file directly:

[mcp_servers.image_understand]
command = "node"
args = ["C:/MegaSync/Projects/Git/image-understand-mcp/dist/index.js"]
env = { GEMINI_MODEL = "gemini-3.5-flash" }

OpenCode Config

Add this to opencode.json:

{
  "$schema": "https://opencode.ai/config.json",
  "mcp": {
    "image_understand": {
      "type": "local",
      "command": ["npx", "-y", "@msalman5230/image-understand-mcp"],
      "enabled": true,
      "environment": {
        "GEMINI_API_KEY": "{env:GEMINI_API_KEY}",
        "GEMINI_MODEL": "gemini-3.5-flash"
      }
    }
  }
}

Example Prompts

  • What is this image? C:/Users/me/Desktop/screenshot.png
  • Use analyze_image on ./diagram.png with mode objects and detail detailed
  • Extract all visible text from ./receipt.jpg using OCR mode

In OpenCode, MCP tools are shown as normal tools, often with the MCP server name prefixed. With the sample config above, the tool may appear as image_understand_analyze_image. If a model says it has no MCP tools but lists that tool, that is a model/tool-routing issue; the tool is available.

Development

Requirements

  • Node.js 18 or newer
  • A Gemini API key in GEMINI_API_KEY
  • Local image files (.png, .jpg, .jpeg, .webp, .gif, .bmp, .heic, .heif)

Environment

  • GEMINI_API_KEY: required Google Gemini API key
  • GEMINI_MODEL: optional model ID, defaults to gemini-3.5-flash
  • IMAGE_UNDERSTAND_INLINE_LIMIT_BYTES: optional inline image limit, defaults to 18 MiB
  • IMAGE_UNDERSTAND_MAX_IMAGE_BYTES: optional maximum image size, defaults to 100 MiB

The MCP server reads only the environment of the process that launches it. It does not load .env, .env.local, or any other dotenv file. For Codex/OpenCode usage, pass GEMINI_API_KEY and GEMINI_MODEL through that client config or through the parent shell environment.

Gemma support in v1 is configuration-based: set GEMINI_MODEL to a Google-accessible, vision-capable Gemma model ID if your account/runtime supports it. This server does not include a local Gemma runtime.

Install

npm install
npm run build
npm test
npm run check

For a simple local Gemini smoke test without Codex/OpenCode, put development values in .env.local, build, and run:

npm run build
npm run smoke -- "C:/path/to/image.jpg" "What is this image?"

The smoke script loads .env.local for development convenience. The MCP server itself does not load dotenv files.

For stdio MCP servers, stdout is reserved for JSON-RPC messages. This server writes diagnostics to stderr only.