image-freshness-mcp

v1.0.2

Published

5 months ago

MCP server for analyzing image freshness and relevance

Downloads

0High
0Medium
0Low

928pjy

mcp model-context-protocol image-analysis freshness typescript

Image Freshness MCP Server

A Model Context Protocol (MCP) server that helps verify if images in documentation are up-to-date. It uses web automation to navigate to web pages, take screenshots of components, and compare them with existing images to ensure documentation accuracy.

Features

This server provides a prompt and a set of tools to automate the process of checking image freshness in technical documentation.

Prompt

analyze-image-freshness: Orchestrates the entire workflow. It takes an image name or line number, navigates to the relevant page, captures a new screenshot, compares it with the original, and reports whether the image is out-of-date.

Tools

find-product-start-page: Identifies the correct starting URL for a given product (e.g., "Azure Portal"), which is crucial for initiating the navigation workflow.
compare-images: Compares two images (an original from the document and a newly captured screenshot) using a powerful multimodal Large Language Model (LLM). It returns a detailed description of the differences, categorized into important and trivial changes.
get-screenshot-instructions: Analyzes an existing screenshot and provides clear instructions on what UI element should be captured. This is useful for resolving ambiguities when the initial comparison shows significant differences due to incorrect screenshot scope.
get-default-account: Returns the demo account credentials used for sign-in workflows. The email address is set via image_scan_username, while the password is read from image_scan_secret_password so you can rotate it without code changes.
get-account-otp-code: Generates the current 6-digit TOTP needed for multi-factor authentication. The underlying secret comes from image_scan_secret_otpKey.
Automatic LLM fallback: When the connected MCP client does not advertise the sampling capability (or you force the fallback), the server automatically sends multimodal prompts to the OpenAI SDK, which can target OpenAI's public API or any OpenAI-compatible deployment via a configurable base URL.

Installation

Install via npm (recommended for users)

You can install the server globally and run it from anywhere:

npm install -g image-freshness-mcp
image-freshness-mcp

Or use npx without a global install:

npx image-freshness-mcp@latest

Local development setup

# Install dependencies
npm install

# Build the project
npm run build

# Start the server
npm start

Required Environment Variables

Before running the server, set the password used by the get-default-account tool:

export image_scan_secret_password="<your-demo-password>"
export image_scan_secret_otpKey="<base32-encoded-otp-secret>"
export image_scan_username="<demo-account-email>"

You can also use the generic names AZURE_OPENAI_API_KEY, AZURE_OPENAI_ENDPOINT, AZURE_OPENAI_DEPLOYMENT, OPENAI_API_KEY, OPENAI_MODEL, and OPENAI_BASE_URL (or the legacy self_llm_* variables) if you already rely on those in your environment.

If any of the environment variables are missing, calls to the corresponding tool will return an error instead of the secret value.

OpenAI fallback behaviour

When the MCP client does not support sampling, the server converts your prompts (text plus base64-encoded images) into Azure OpenAI Chat Completion messages and calls the SDK. Provide a custom image_scan_azure_openai_endpoint (or image_scan_openai_base_url) to target your Azure deployment or any OpenAI-compatible endpoint. Set image_scan_force_openai=true if you want to always use the SDK, even when the client advertises sampling support.

Development

# Watch mode for development
npm run dev

# Run linting
npm run lint

# Run tests
npm test

Usage with a Copilot Extension

Add this server to your Copilot extension's configuration file (e.g., mcp.json). This example also shows how to configure the Playwright MCP server to connect to your browser, allowing the assistant to interact with pages where you are already logged in.

{
	"servers": {
		"playwright-mcp": {
			"type": "stdio",
			"command": "npx",
			"args": [
				"@playwright/mcp@latest",
				"--browser",
				"msedge",
				"--user-data-dir",
				"%USERPROFILE%\\AppData\\Local\\Microsoft\\Edge\\User Data",
				"--extension"
			],
			"env": {}
		},
		"image-freshness": {
			"type": "stdio",
			"command": "node",
			"args": [
				"D:\\src\\hackathon-image-freshness\\build\\index.js"
			]
		}
	},
	"inputs": []
}

For more details on setting up the Playwright MCP extension, refer to its documentation. The key is to run the Playwright MCP server with the --extension flag.

Playwright Extension Setup

The Playwright MCP Chrome Extension allows you to connect to pages in your existing browser and leverage the state of your default user profile. This means the AI assistant can interact with websites where you're already logged in, using your existing cookies, sessions, and browser state.

Prerequisites

Chrome/Edge/Chromium browser

Installation

Download the Extension: Download the latest Chrome extension from the Playwright MCP releases page.
Load the Extension:
- Open your browser and navigate to chrome://extensions/.
- Enable "Developer mode" (usually a toggle in the top right corner).
- Click "Load unpacked" and select the directory where you unzipped the extension.

Usage

When the assistant needs to use the browser, it will prompt you to select a tab to connect to. This gives you control over which page the assistant can interact with.

Technical Details

Built with TypeScript and the @modelcontextprotocol/sdk.
Uses a standard stdio transport for communication.
The compare-images and get-screenshot-instructions tools leverage a multimodal LLM for advanced image analysis.
Includes robust error handling and input validation for file paths and image formats.

Future Enhancements

Integration with more computer vision APIs for deeper analysis.
Advanced algorithms for detecting subtle but important UI changes.
Support for batch processing multiple images in a single run.
Caching mechanisms for improved performance.

Publishing this package

To share the MCP server publicly, publish it to the npm registry:

# Build the TypeScript output
npm run build

# Log in (once per machine)
npm login

# Publish the package
npm publish --access public

The prepublishOnly script ensures the latest build is generated automatically during publish.

License

MIT License