npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@sanity-labs/mcp-see

v0.1.0

Published

MCP server for AI-powered image analysis - describe, detect objects, analyze colors

Readme

mcp-see

An MCP server that gives AI agents eyes - the ability to observe and understand images without stuffing raw pixels into their context window.

Features

  • Multi-provider vision: Describe images using Gemini, OpenAI, or Claude
  • Object detection: Find objects with bounding boxes (Gemini only - native bbox support)
  • Hierarchical analysis: Detect regions, then zoom in for detail
  • Precise color extraction: K-Means clustering in LAB color space (runs locally, no API needed)
  • Color naming: Human-readable color names via color.pizza API
  • URL support: Analyze images directly from the web (http/https)

TL;DR: A Gemini API key gives you full functionality. OpenAI/Claude are optional alternatives for image description only.

Installation

Install from npm:

npx @sanity-labs/mcp-see

Or install globally:

npm install -g @sanity-labs/mcp-see

Or clone and build locally:

git clone https://github.com/sanity-labs/mcp-see.git
cd mcp-see
npm install
npm run build

MCP Client Configuration

Claude Desktop

Add to your Claude Desktop config (~/Library/Application Support/Claude/claude_desktop_config.json on macOS):

{
  "mcpServers": {
    "mcp-see": {
      "command": "npx",
      "args": ["@sanity-labs/mcp-see"],
      "env": {
        "GEMINI_API_KEY": "your-gemini-api-key"
      }
    }
  }
}

Get your Gemini API key from Google AI Studio.

With all providers (optional):

{
  "mcpServers": {
    "mcp-see": {
      "command": "npx",
      "args": ["@sanity-labs/mcp-see"],
      "env": {
        "GEMINI_API_KEY": "your-gemini-api-key",
        "OPENAI_API_KEY": "sk-...",
        "ANTHROPIC_API_KEY": "sk-ant-..."
      }
    }
  }
}

Enterprise/Vertex AI users:

{
  "mcpServers": {
    "mcp-see": {
      "command": "npx",
      "args": ["@sanity-labs/mcp-see"],
      "env": {
        "GOOGLE_CLOUD_PROJECT": "your-gcp-project-id"
      }
    }
  }
}

Requires ADC setup: gcloud auth application-default login

Other MCP Clients

The server runs on stdio transport. Configure your client to spawn npx @sanity-labs/mcp-see.

Tools

describe

Get an AI-generated description of an image.

Input:

{
  "image": "/path/to/image.png or https://example.com/image.jpg",
  "prompt": "What is shown in this image?",
  "provider": "gemini",
  "detail": "detailed"
}

Example Output:

The image shows a vibrant and colorful salad bowl, viewed from directly above.
The bowl is made of a light brown, possibly biodegradable material. The salad
is composed of various ingredients arranged in distinct sections: two small
white peeled eggs, sliced red tomatoes topped with chopped green onions, cubed
seasoned tofu, bright green edamame beans, shredded purple cabbage, and
julienned carrots...

detect

Detect objects and return bounding boxes. Uses Gemini for native bbox support.

Input:

{
  "image": "/path/to/image.png",
  "prompt": "find all TV screens"
}

Example Output:

{
  "count": 3,
  "objects": [
    { "id": 1, "label": "television", "bbox": [178, 245, 433, 818] },
    { "id": 2, "label": "television", "bbox": [614, 518, 792, 898] },
    { "id": 3, "label": "television", "bbox": [617, 198, 792, 493] }
  ]
}

Coordinates are [ymin, xmin, ymax, xmax] normalized 0-1000.

describe_region

Crop to a bounding box and describe that region in detail.

Input:

{
  "image": "/path/to/image.png",
  "bbox": [200, 200, 800, 800],
  "prompt": "describe this in detail",
  "provider": "gemini"
}

Example Output:

{
  "bbox": [200, 200, 800, 800],
  "description": "The image showcases a vibrant and colorful salad bowl in close-up. The bowl contains fresh ingredients including cubed tofu with a seasoned exterior, bright green edamame, sliced tomatoes, and shredded purple cabbage..."
}

analyze_colors

Extract dominant colors from a region using K-Means clustering in LAB color space.

Input:

{
  "image": "/path/to/image.png",
  "bbox": [100, 200, 400, 600],
  "top": 5
}

Example Output:

{
  "dominant": [
    {
      "hex": "#e6e6e5",
      "rgb": [230, 230, 229],
      "hsl": { "h": 60, "s": 2, "l": 90 },
      "name": "Ambience White",
      "percentage": 75.91
    },
    {
      "hex": "#b16c39",
      "rgb": [177, 108, 57],
      "hsl": { "h": 26, "s": 51, "l": 46 },
      "name": "Ginger Dough",
      "percentage": 15.91
    }
  ],
  "average": {
    "hex": "#c4b8a8",
    "rgb": [196, 184, 168],
    "name": "Doeskin"
  },
  "confidence": "high",
  "region": {
    "bbox": [100, 200, 400, 600],
    "size": [200, 150],
    "totalPixels": 30000
  }
}

The confidence field indicates color precision:

  • high: Flat colors (UI elements) - clusters are tight
  • medium: Mixed content
  • low: Photographs/gradients - colors are approximate

Workflows

Hierarchical Image Understanding

The power of mcp-see is in combining tools for progressive analysis:

1. describe(image)
   → "A shelf displaying various vintage electronics and TVs"

2. detect(image, "find all screens")
   → [{label: "television", bbox: [178, 245, 433, 818]}, ...]

3. describe_region(image, [178, 245, 433, 818])
   → "A vintage CRT television with wood grain casing, displaying
      a test pattern. The screen shows horizontal color bars..."

4. analyze_colors(image, [178, 245, 433, 818])
   → dominant: ["#2b1810" Espresso Bean, "#c4a882" Sandcastle, ...]

Design Reference Analysis

Extract implementation-ready specs from design mockups:

1. describe(image, "explain this UI to a web developer")
   → Layout structure, component hierarchy, spacing patterns

2. detect(image, "find all buttons")
   → Bounding boxes for each button

3. For each button:
   - describe_region() → Button label, icon, state
   - analyze_colors() → Exact color tokens for CSS

API Keys

Quick Start: Gemini Only

For full functionality, you only need a Gemini API key:

| Variable | Description | |----------|-------------| | GEMINI_API_KEY | Get one from Google AI Studio |

This gives you access to all tools: describe, detect, describe_region, and analyze_colors.

Tool Availability by Provider

| Tool | Gemini | OpenAI | Claude | No API | |------|--------|--------|--------|--------| | describe | ✅ | ✅ | ✅ | | | describe_region | ✅ | ✅ | ✅ | | | detect | ✅ | ❌ | ❌ | | | analyze_colors | | | | ✅ |

  • detect (object detection with bounding boxes) requires Gemini - it's the only provider with native bounding box support
  • analyze_colors runs locally using K-Means clustering - no API key needed

All Environment Variables

| Variable | Required | Description | |----------|----------|-------------| | GEMINI_API_KEY | Recommended | API key from Google AI Studio. Enables all tools. | | GOOGLE_CLOUD_PROJECT | Alternative | GCP project ID for Vertex AI instead of Gemini API. Requires ADC setup (gcloud auth application-default login). | | OPENAI_API_KEY | Optional | OpenAI API key for GPT-4o vision. Alternative provider for describe and describe_region. | | ANTHROPIC_API_KEY | Optional | Anthropic API key for Claude vision. Alternative provider for describe and describe_region. |

If both GEMINI_API_KEY and GOOGLE_CLOUD_PROJECT are set, GEMINI_API_KEY takes precedence.

Technical Details

Color Extraction Algorithm

The analyze_colors tool uses K-Means clustering in LAB color space:

  1. Convert pixels from RGB to LAB (perceptually uniform)
  2. Subsample to 50k pixels for performance
  3. K-Means++ initialization for better convergence
  4. Cluster centroids become dominant colors
  5. Convert back to RGB, name via color.pizza API

This approach groups perceptually similar colors together, working well for both flat UI colors and noisy photographs.

Bounding Box Format

All bounding boxes use [ymin, xmin, ymax, xmax] format with coordinates normalized to 0-1000. To convert to pixel coordinates:

const pixelX = (normalizedX / 1000) * imageWidth;
const pixelY = (normalizedY / 1000) * imageHeight;

License

MIT