npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

glippy-mcp

v0.4.3

Published

MCP server for GEO (Generative Engine Optimization) analysis — check any domain's AI-readiness

Readme

Glippy GEO MCP Server

An MCP (Model Context Protocol) server that exposes Glippy's GEO (Generative Engine Optimization) analysis capabilities as tools for AI agents.

Overview

This MCP server enables AI models (Claude, GPT, etc.) to directly analyse any domain's GEO readiness — how well a website is prepared for AI crawlers, LLM-powered search, and agent interaction.

It wraps the Glippy desktop app's server-side analysis engine (geo-checker.js) and exposes it over the standard MCP protocol via stdio transport.

Key features:

  • Full 16-category GEO analysis with weighted scoring
  • robots.txt AI crawler access detection
  • llms.txt file discovery and parsing
  • Agent-readiness discovery - detects emerging agent standards (Content-Signal, llms-full.txt, MCP/A2A/Agent-Skills cards, schemamap, NLWeb, feed discovery)
  • Sitemap crawling and multi-page analysis
  • Domain comparison and competitive analysis
  • Export to styled Markdown or HTML reports
  • Smart caching - automatic deduplication of repeated analyses
  • JSON output mode - pass analysis results between tools to avoid re-crawling
  • Headless Chrome fallback - automatically retries via a real browser when a site blocks bot-shaped fetches (Cloudflare, Akamai, DataDome, etc.)

Table of Contents


Installation

Via npm (recommended)

npm install -g glippy-mcp

Via npx (no install needed)

Use directly via npx in your MCP configuration:

npx -y glippy-mcp

Requirements

  • Node.js 18.0.0 or higher
  • Valid Glippy MCP license key
  • Optional: Google Chrome or Chromium installed locally. Only needed if you want the Chrome-rendered fallback to kick in when a target site blocks static fetches. Without Chrome the server still works; it just cannot recover from WAF-blocked pages.

Configuration

License Key

A valid Glippy MCP license key (GLMCP-XXXX-XXXX-XXXX) is required. Get one at glippy.dev.

The server validates the key against the Glippy API on first use and caches the result for 24 hours. Analysis runs locally on your machine — only the license check calls the server.

Usage with Claude Desktop

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "glippy-geo": {
      "command": "npx",
      "args": ["-y", "glippy-mcp"],
      "env": {
        "GLIPPY_LICENSE_KEY": "GLMCP-XXXX-XXXX-XXXX"
      }
    }
  }
}

Config file locations:

  • macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
  • Windows: %APPDATA%\Claude\claude_desktop_config.json
  • Linux: ~/.config/Claude/claude_desktop_config.json

Usage with Claude Code

Add to your .mcp.json in your project root or ~/.claude/.mcp.json for global access:

{
  "mcpServers": {
    "glippy-geo": {
      "command": "npx",
      "args": ["-y", "glippy-mcp"],
      "env": {
        "GLIPPY_LICENSE_KEY": "GLMCP-XXXX-XXXX-XXXX"
      }
    }
  }
}

Environment Variables

| Variable | Required | Default | Description | |----------|----------|---------|-------------| | GLIPPY_LICENSE_KEY | Yes | - | Your MCP license key (GLMCP-XXXX-XXXX-XXXX) | | GLIPPY_RATE_LIMIT | No | 5 | Default max requests/second per domain for batch tools | | CHROME_PATH | No | auto-detect | Absolute path to your Chrome/Chromium binary. Overrides the built-in detection list. | | PUPPETEER_EXECUTABLE_PATH | No | auto-detect | Alternative name for CHROME_PATH, honored for puppeteer-core compatibility. | | CHROME_REMOTE_URL | No | - | Attach to an already-running Chrome instead of launching a new one. Accepts either http://host:9222 (browserURL) or ws://... (browserWSEndpoint). Start Chrome with --remote-debugging-port=9222. | | CHROME_HEADLESS | No | new | Set to 0 or false to run Chrome visible. Useful for sites that aggressively detect headless. | | CHROME_USER_DATA_DIR | No | - | Path to a Chrome user-data directory. Lets the fallback reuse cookies, extensions, and auth state from a dedicated profile. |


Integration Guides

For detailed setup instructions across all supported environments, see the Integration Guide.

Supported Environments

| Environment | Support Level | Config File | |-------------|---------------|-------------| | Claude Code (VS Code) | Native MCP | .mcp.json | | Claude CLI (Terminal) | Native MCP | .mcp.json | | Claude Desktop | Native MCP | claude_desktop_config.json | | Cursor IDE | Native MCP | .cursor/mcp.json | | Windsurf IDE | Native MCP | .windsurf/mcp.json | | Continue.dev | Native MCP | ~/.continue/config.json | | ChatGPT / OpenAI | Via bridge/API | Custom integration |

The integration guide includes:

  • Step-by-step setup for each environment
  • Platform-specific config file locations
  • Usage examples and prompts
  • Verification and testing instructions
  • Troubleshooting tips

Tools Reference

analyze_domain

Run a comprehensive GEO readiness analysis on a domain.

Description: Checks robots.txt, llms.txt, homepage HTML (16 scoring categories), sitemap.xml, and security headers. Returns an overall weighted score (0-100) with per-category breakdowns and actionable recommendations. Use render_mode="auto" to transparently fall back to headless Chrome when a site blocks static fetches (Cloudflare, Akamai, etc.). Use output_format="json" to get raw results that can be passed to export_report.

Parameters:

| Parameter | Type | Required | Description | |-----------|------|----------|-------------| | domain | string | Yes | The domain to analyse, e.g. "example.com". Do not include https:// prefix. | | max_pages | integer | No | Maximum pages to crawl (1-10). Default: 10. | | render_mode | enum | No | "static" (default) = plain Node fetch, fastest. "auto" = static first, falls back to a local headless Chrome on bot-blocked responses (401/403/407/429/503 or empty 2xx). "chrome" = always render via Chrome. Chrome modes need a local Chrome binary (see Chrome Rendering Fallback). | | output_format | enum | No | "text" (default) for human-readable report, "json" for raw results to pass to export_report. |

Example:

Analyse GEO readiness for example.com

Example (JSON output for chaining):

analyze_domain domain="example.com" max_pages=5 output_format="json"
# Then pass the result to export_report

Returns:

  • Overall GEO score (0-100) with letter grade
  • Page type detection (article, product, homepage, etc.)
  • 16 category scores with pass/fail/warn checks
  • robots.txt analysis with AI crawler access
  • llms.txt presence and content preview
  • Sitemap discovery status
  • Multi-page aggregated scores (if max_pages > 1)
  • renderMode flag on the result: static, chrome-fallback, or an error code if both paths failed

check_robots_txt

Check a domain's robots.txt specifically for AI crawler access rules.

Description: Reports which AI crawlers (GPTBot, ClaudeBot, etc.) are blocked or allowed.

Parameters:

| Parameter | Type | Required | Description | |-----------|------|----------|-------------| | domain | string | Yes | The domain to check, e.g. "example.com". Do not include https:// prefix. |

Example:

Check which AI crawlers are blocked on example.com

Returns:

  • robots.txt existence and URL
  • Wildcard disallow detection (Disallow: /)
  • Per-crawler access status for:
    • GPTBot
    • Google-Extended
    • CCBot
    • anthropic-ai
    • ClaudeBot
    • Bytespider
    • PerplexityBot
    • ChatGPT-User
    • AmazonBot
    • cohere-ai
  • Sitemap references found in robots.txt
  • Content-Signal directive (search / ai-input / ai-train preferences), when present

check_llms_txt

Check if a domain has an llms.txt file.

Description: Checks for the emerging standard file that provides context to LLMs about a site's purpose and content.

Important: llms.txt is an emerging proposal, but it is not currently supported or consumed by major AI models, crawlers, or MCP clients. No mainstream LLM or AI agent reads llms.txt to inform its behaviour. Having an llms.txt file should not be seen as a relevant optimization for your GEO readiness — it will not meaningfully improve how AI systems discover or understand your site today. That said, it cannot hurt to have one: the file is lightweight, easy to create, and if the standard gains adoption in the future you will already be prepared.

Parameters:

| Parameter | Type | Required | Description | |-----------|------|----------|-------------| | domain | string | Yes | The domain to check, e.g. "example.com". Do not include https:// prefix. |

Example:

Does example.com have an llms.txt file?

Returns:

  • llms.txt existence
  • Full file contents if present
  • Link to specification at https://llmstxt.org

get_geo_summary

Get a concise GEO readiness summary for quick assessment.

Description: Returns overall score, grade, top 3 strengths, and top 3 issues to fix. Use this for a quick overview; use analyze_domain for full details.

Parameters:

| Parameter | Type | Required | Description | |-----------|------|----------|-------------| | domain | string | Yes | The domain to check, e.g. "example.com". Do not include https:// prefix. | | render_mode | enum | No | "static" (default), "auto" (static with Chrome fallback on bot-block), or "chrome" (always Chrome). See Chrome Rendering Fallback. |

Example:

Give me a quick GEO summary of example.com

Returns:

  • Overall score and grade
  • Page type detected
  • Top 3 strongest categories
  • Top 3 weakest categories with top issue
  • Quick facts (robots.txt, llms.txt, sitemap, blocked crawlers)

compare_domains

Analyse multiple domains in parallel and compare scores.

Description: Returns a comparison table with overall scores, per-category breakdowns, and a ranked summary. Useful for competitive analysis or auditing a portfolio of sites. Use output_format="json" to get raw results that can be passed to export_bulk_report.

Parameters:

| Parameter | Type | Required | Description | |-----------|------|----------|-------------| | domains | array[string] | Yes | List of 2-50 domains to compare, e.g. ["example.com", "competitor.com"]. Do not include https:// prefix. For more than 50 domains, split across multiple runs and merge the results. | | max_pages | integer | No | Maximum pages to crawl per domain (1-10). Default: 10. | | render_mode | enum | No | "static" (default), "auto" (static with Chrome fallback on bot-block), or "chrome" (always Chrome). See Chrome Rendering Fallback. | | output_format | enum | No | "text" (default) for comparison table, "json" for raw results to pass to export_bulk_report. |

Example:

Compare GEO scores of example.com, competitor1.com, and competitor2.com

Returns:

  • Ranked list of domains by score
  • Category comparison table (all 16 categories)
  • Quick facts comparison (robots.txt, llms.txt, sitemap, blocked crawlers)
  • Error details for any failed analyses

analyze_sitemap

Fetch a sitemap and analyse all discovered pages.

Description: Fetches a sitemap XML (or sitemap index), extracts page URLs, and runs GEO analysis on each page. Returns per-page scores, category averages, and identifies weakest pages. Use output_format="json" to get raw results that can be passed to export_bulk_report.

Parameters:

| Parameter | Type | Required | Description | |-----------|------|----------|-------------| | sitemap_url | string | Yes | Full URL to sitemap, e.g. "https://example.com/sitemap.xml" | | max_urls | integer | No | Maximum URLs to analyse (1-50,000). Default: all URLs found. | | rate_limit | number | No | Max requests/second per domain (0.1-100). Default: 5. | | render_mode | enum | No | "static" (default), "auto" (static with Chrome fallback on bot-block), or "chrome" (always Chrome). Applied per URL. See Chrome Rendering Fallback. | | output_format | enum | No | "text" (default) for report, "json" for raw results to pass to export_bulk_report. |

Example:

Analyse all pages in https://example.com/sitemap.xml

Returns:

  • Total URLs found vs analysed
  • Per-page results table (URL, score, grade, page type)
  • Category averages across all pages
  • Weakest pages with their problem categories

Supports:

  • Regular sitemaps (<urlset>)
  • Sitemap index files (<sitemapindex>) — fetches up to 3 sub-sitemaps

analyze_urls

Run GEO analysis on a list of specific URLs.

Description: Fetches each page, scores it across 10 categories, and returns per-page results with aggregated averages. URLs can span multiple domains. Use output_format="json" to get raw results that can be passed to export_bulk_report.

Parameters:

| Parameter | Type | Required | Description | |-----------|------|----------|-------------| | urls | array[string] | Yes | List of 1-50,000 full URLs, e.g. ["https://example.com/about", "https://example.com/pricing"]. Include https:// prefix. | | rate_limit | number | No | Max requests/second per domain (0.1-100). Default: 5. | | render_mode | enum | No | "static" (default), "auto" (static with Chrome fallback on bot-block), or "chrome" (always Chrome). Applied per URL. See Chrome Rendering Fallback. | | output_format | enum | No | "text" (default) for report, "json" for raw results to pass to export_bulk_report. |

Example:

Analyse these specific pages: https://example.com/about, https://example.com/pricing, https://example.com/contact

Returns:

  • Per-page results table (URL, score, grade, page type)
  • Category averages across all pages
  • Weakest pages with their problem categories

export_report

Generate a styled, shareable report file.

Description: Runs GEO analysis and returns results as a self-contained report in Markdown or HTML format — matching the Glippy browser extension's export output. You can optionally pass pre-computed analysis results to avoid re-crawling.

Parameters:

| Parameter | Type | Required | Description | |-----------|------|----------|-------------| | domain | string | No* | The domain to analyse, e.g. "example.com". Do not include https:// prefix. | | format | enum | Yes | Report format: "markdown" (recommendations only), "markdown_full" (all categories and checks), or "html" (standalone styled page). | | max_pages | integer | No | Maximum pages to crawl (1-10). Default: 10. Ignored if analysis_result is provided. | | render_mode | enum | No | "static" (default), "auto" (Chrome fallback on bot-block), or "chrome" (always Chrome). Ignored if analysis_result is provided. See Chrome Rendering Fallback. | | analysis_result | object | No* | Pre-computed analysis result from analyze_domain (with output_format="json"). Skips re-crawling. |

*Either domain or analysis_result must be provided.

Example:

Generate an HTML report for example.com

Example (using pre-computed results):

# First, analyze with JSON output:
analyze_domain domain="example.com" max_pages=5 output_format="json"

# Then export without re-crawling:
export_report format="html" analysis_result=<result from above>

Returns:

  • Complete report content ready to save
  • For HTML: Standalone page with dark/light theme toggle, score ring, category accordion, recommendations table
  • For Markdown: Structured document with priority-sorted recommendations

export_bulk_report

Generate a styled report for bulk analysis.

Description: Creates a comprehensive report for comparing multiple domains, analysing a list of URLs, or crawling a sitemap. Returns a self-contained report with rankings, category breakdowns, and per-domain/page recommendations. You can pass pre-computed results to avoid re-crawling.

Parameters:

| Parameter | Type | Required | Description | |-----------|------|----------|-------------| | format | enum | Yes | Report format: "markdown" or "html" | | domains | array[string] | No* | Compare 2-50 domains. Do not include https://. For more than 50, run multiple times. | | urls | array[string] | No* | Analyse 1-50,000 specific URLs. Include https://. | | sitemap_url | string | No* | Crawl a sitemap URL. | | analysis_results | object | No* | Pre-computed results from compare_domains, analyze_urls, or analyze_sitemap (with output_format="json"). | | max_pages | integer | No | For domain mode: pages per domain (1-10). Default: 10. Ignored if analysis_results provided. | | max_urls | integer | No | For sitemap mode: max URLs to analyse. Default: all. Ignored if analysis_results provided. | | rate_limit | number | No | Max requests/second per domain. Default: 5. Ignored if analysis_results provided. | | render_mode | enum | No | "static" (default), "auto" (Chrome fallback on bot-block), or "chrome" (always Chrome). Ignored if analysis_results provided. See Chrome Rendering Fallback. |

*Provide exactly one of: domains, urls, sitemap_url, or analysis_results.

Example:

Generate an HTML comparison report for example.com and competitor.com

Example (using pre-computed results):

# First, compare with JSON output:
compare_domains domains=["example.com", "competitor.com"] output_format="json"

# Then export without re-crawling:
export_bulk_report format="html" analysis_results=<result from above>

Returns:

  • Domain comparison: Rankings, category comparison table, quick facts, per-domain recommendations
  • URL/Sitemap analysis: Per-page results, category averages, common issues across pages, weakest/strongest pages

GEO Scoring Categories

The analysis evaluates 16 categories, each with a weight reflecting its importance for AI/LLM readiness:

| # | Category | Weight | What It Measures | |---|----------|--------|------------------| | 1 | Structured Data & Schema | 1.5x | JSON-LD presence, Schema.org types (FAQPage, Article, Product, etc.), Speakable markup, schema validation | | 2 | Semantic HTML | 1.2x | Heading hierarchy (H1-H6), semantic elements (<article>, <nav>, <main>), content-to-markup ratio | | 3 | Accessibility for Agents | 1.0x | Lang attribute, alt text on images, ARIA labels, descriptive link text | | 4 | Internal Linking | 1.0x | Link density, navigation structure, breadcrumb markup | | 5 | Meta & Discoverability | 1.0x | Title, meta description, canonical URL, Open Graph tags, hreflang | | 6 | Machine Readability | 1.5x | SSR detection, bot blocking checks, robots.txt rules, llms.txt presence*, robots.txt Content-Signal directive, llms-full.txt, HTTP Link discovery headers, Markdown source endpoints, RSS/Atom/JSON feed discovery | | 7 | Entity & Authority | 1.0x | Author info, publication dates, organization schema, E-E-A-T signals, credentials, editorial policy, contact completeness | | 8 | Citability & Answer-Readiness | 1.3x | FAQ content, data tables, lists, lead paragraph quality | | 9 | Performance & Crawlability | 0.3x | Image dimensions, lazy loading, resource hints | | 10 | Agent Interactivity | 0.2x | WebMCP tools, form annotations, agent-callable actions, MCP server card (/.well-known/mcp/server-card.json), A2A agent card, Agent-Skills index, NLWeb endpoint, schemamap | | 11 | Content Positioning | 1.2x | Brand differentiation, proof points, social proof | | 12 | Content Freshness | 0.8x | Date signals, content age, temporal language | | 13 | Information Density | 1.0x | Substantive-to-filler ratio, section depth, claim-evidence pairing | | 14 | Factual Verifiability | 0.8x | Citations, source attribution, methodology disclosure | | 15 | Content Comprehensiveness | 0.8x | Word count, heading coverage, definitions, comparisons | | 16 | Multimodal Content | 0.5x | Image alt text, figures, video/audio, SVG, multimedia schema |

*llms.txt is checked for presence but is not currently supported or consumed by any major AI model or crawler. It has minimal practical impact on GEO readiness today — see the check_llms_txt section for details.

Scoring

  • Each category produces a score from 0-100
  • The overall score is a weighted average using the weights above
  • Scores map to letter grades: A+ (90+), A (80+), B (70+), C (60+), D (40+), F (<40)

Agent-Readiness Discovery

Alongside the established checks, the server probes a set of emerging agent-readiness standards. These surfaces let agents discover and consume a site without scraping HTML.

These checks are bonus-scored: a site gets credit when a surface is present, but absence is reported as informational guidance rather than a penalty. This keeps the long tail of sites that have not adopted these new standards from being unfairly marked down, while still rewarding early adopters.

| Surface | Where it's checked | What it signals | |---------|--------------------|-----------------| | Content-Signal | robots.txt directive | Machine-readable AI usage preferences (search / ai-input / ai-train). Only ai-input=no affects AI answer visibility; ai-train=no is treated as a training-only preference with no citation impact. | | llms-full.txt | /llms-full.txt | Concatenated Markdown corpus of the pages listed in llms.txt, for full-context ingestion. Very large files (>5 MB) are flagged. | | HTTP Link discovery | response Link header | Resource discovery via headers (rel="describedby", api-catalog, sitemap, mcp, service-desc, nlweb) without parsing HTML. | | Markdown source endpoint | <link rel="alternate" type="text/markdown"> or content negotiation | A clean .md version of each page for agent ingestion. | | Feed discovery | <link rel="alternate"> | RSS / Atom / JSON feeds as a machine-readable content stream. | | MCP server card | /.well-known/mcp/server-card.json | Discoverable MCP server (name, version, transport, endpoint, tools). | | A2A agent card | /.well-known/agent-card.json | Agent-to-agent discovery with declared skills. | | Agent-Skills index | /.well-known/agent-skills/index.json | Reusable agent skills exposed with digests. | | NLWeb endpoint | <link rel="nlweb"> or Link header | Natural-language query endpoint (conventionally /ask). | | Schemamap | /schemamap.xml or <link rel="schemamap"> | Per-resource JSON-LD (.jsonld) endpoints for agent-friendly structured data. |

Content-Signal, HTTP Link discovery, Markdown source endpoints, llms-full.txt, and feed discovery feed into the Machine Readability category; the MCP/A2A/Agent-Skills cards, NLWeb, and schemamap feed into Agent Interactivity. The raw findings are also returned under an agentReadiness object in output_format="json" results.


Rate Limiting

To prevent overwhelming target servers during batch operations, the MCP server enforces per-domain rate limiting:

Configuration

  1. Environment variable: Set GLIPPY_RATE_LIMIT=3 for 3 requests/second default
  2. Per-call parameter: Pass rate_limit to analyze_sitemap, analyze_urls, or export_bulk_report

Recommended Values

| Scenario | Rate Limit | Description | |----------|------------|-------------| | Polite crawling | 0.5 - 1 | 1 request every 1-2 seconds | | Default | 5 | 5 requests/second (balanced) | | Your own server | 10 - 50 | Faster crawling when you control the target | | Aggressive | 100 | Maximum speed (use with caution) |

How It Works

  • Requests to different domains run in parallel
  • Requests to the same domain are serialized with the configured delay
  • Global concurrency is capped at 10 simultaneous requests

Output Formats

Text (Default)

All tools return structured text output by default, suitable for:

  • Inline display in chat
  • Quick analysis and follow-up questions
  • Programmatic parsing

Markdown Reports

Generated by export_report and export_bulk_report:

  • Clean, readable structure
  • Priority-sorted recommendations (High → Medium → Low)
  • Tables for easy comparison
  • Save as .md file

HTML Reports

Generated by export_report and export_bulk_report:

  • Standalone, self-contained page (no external dependencies)
  • Dark/light theme toggle with system preference detection
  • Interactive category accordion
  • Score ring visualization
  • Copy recommendations button
  • Print-friendly styling
  • Save as .html file

Caching & Efficient Workflows

The MCP server includes smart caching and result-passing features to avoid redundant crawling.

Automatic Caching

Analysis results are cached in-memory for 5 minutes with the following behavior:

  • Key: domain + maxPages — cached results are reused when the same domain is analyzed again
  • Smart coverage: If you request max_pages=3 and there's a cached result with max_pages=5, the cache is used
  • Automatic: No configuration needed — just call tools normally and caching happens automatically

Example workflow (automatic):

# First call — crawls the site
analyze_domain domain="example.com" max_pages=5

# Second call within 5 minutes — uses cached result
export_report domain="example.com" format="html"

JSON Output Mode

For explicit control, use output_format="json" to get raw analysis results that can be passed to export tools.

Single domain workflow:

# Step 1: Analyze with JSON output
analyze_domain domain="example.com" max_pages=5 output_format="json"
# Returns full analysis object as JSON

# Step 2: Export multiple formats without re-crawling
export_report format="html" analysis_result=<JSON from step 1>
export_report format="markdown_full" analysis_result=<JSON from step 1>

Multi-domain workflow:

# Step 1: Compare with JSON output
compare_domains domains=["site1.com", "site2.com"] output_format="json"
# Returns array of analysis results

# Step 2: Generate report without re-crawling
export_bulk_report format="html" analysis_results=<JSON from step 1>

Sitemap/URL workflow:

# Step 1: Analyze sitemap with JSON output
analyze_sitemap sitemap_url="https://example.com/sitemap.xml" output_format="json"
# Returns { sitemap_url, pageResults, aggregated }

# Step 2: Generate report without re-crawling
export_bulk_report format="html" analysis_results=<JSON from step 1>

When to Use Each Approach

| Scenario | Recommended Approach | |----------|---------------------| | Quick analysis + single export | Automatic caching (just call both tools) | | Generate multiple report formats | JSON output mode (analyze once, export many) | | Time-sensitive workflow | JSON output mode (guaranteed no re-crawling) | | Interactive exploration | Automatic caching (ask questions, then export) |


Chrome Rendering Fallback

Some sites (Cloudflare, Akamai, PerimeterX, DataDome, Incapsula) refuse static Node fetches with 401/403/429/503 responses. The server can drive a real Chrome instance to fetch those pages instead, so they still get scored.

Choosing a render mode

Every analysis tool (analyze_domain, get_geo_summary, compare_domains, analyze_urls, analyze_sitemap, export_report, export_bulk_report) accepts a render_mode parameter:

| Mode | Behavior | Use when | |------|----------|----------| | static (default) | Plain Node fetch. Fast. No Chrome required. | You're scoring sites that don't block bots, or you explicitly want to see how a static crawler experiences the page. | | auto | Static fetch first. If it looks bot-blocked (status 401/403/407/429/503, or 2xx with an empty body), retry that URL via Chrome. | Mixed workloads - most sites fast-path through static; only blocked ones pay the Chrome cost. Recommended for competitive audits across a list of domains. | | chrome | Every URL fetched via Chrome. Slowest, most resilient. | You know the targets aggressively detect headless and want to front-load the Chrome cost, or you're debugging rendering differences. |

The result object includes a renderMode field so you can tell which path ran: static, chrome, chrome-fallback, chrome-blocked-<code> (Chrome tried but also got blocked), or static-blocked (both paths failed).

Setup

Chrome modes need a Chrome or Chromium binary. The server looks in these locations, in order:

  1. CHROME_PATH env var
  2. PUPPETEER_EXECUTABLE_PATH env var
  3. C:/Program Files/Google/Chrome/Application/chrome.exe
  4. C:/Program Files (x86)/Google/Chrome/Application/chrome.exe
  5. /Applications/Google Chrome.app/Contents/MacOS/Google Chrome
  6. /usr/bin/google-chrome, /usr/bin/chromium, /usr/bin/chromium-browser

If none exist, render_mode: "static" still works; only the Chrome-backed modes become unavailable.

Attaching to your own Chrome

For sites that fingerprint headless Chrome, start a Chrome instance with remote debugging and point the server at it. The server will attach to that instance instead of launching its own:

# macOS
/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome \
  --remote-debugging-port=9222 --user-data-dir=/tmp/glippy-chrome

# Windows (PowerShell)
& "C:\Program Files\Google\Chrome\Application\chrome.exe" `
  --remote-debugging-port=9222 --user-data-dir=C:\Temp\glippy-chrome

# Then in your MCP config env:
#   CHROME_REMOTE_URL=http://127.0.0.1:9222

Using a dedicated --user-data-dir keeps this session isolated from your normal browsing. When attached, the fetcher leaves UA/headers/stealth untouched so requests look identical to a human using that browser.

Visible mode

For debugging, set CHROME_HEADLESS=0 to watch Chrome drive itself. Purely for development - leave it off in production.


Architecture

research-mcp/
├── src/
│   ├── index.js           # MCP server - tool registration, JSON-RPC handling, license validation
│   ├── geo-checker.js     # GEO analysis engine - fetches & scores domains
│   └── chrome-fetcher.js  # Headless Chrome adapter (puppeteer-core) for WAF-blocked sites
├── package.json
└── README.md

Analysis Flow

  1. Fetch resources in parallel:

    • robots.txt
    • llms.txt
    • Homepage HTML (static fetch first, Chrome fallback if bot-blocked)
    • sitemap.xml
    • UCP profile (/.well-known/ucp)
    • Agent-readiness discovery surfaces: /llms-full.txt, /.well-known/mcp/server-card.json, /.well-known/agent-card.json, /.well-known/agent-skills/index.json, /schemamap.xml
  2. Parse HTML with cheerio (server-side DOM)

  3. Run 16 weighted scoring categories

  4. Return comprehensive analysis with actionable recommendations

Protocol

  • Transport: stdio (JSON-RPC 2.0 over stdin/stdout)
  • SDK: @modelcontextprotocol/sdk (official TypeScript MCP SDK)
  • Logging: All logs go to stderr (stdout reserved for MCP protocol)

Manual Testing

Test the MCP server directly via command line:

# Send MCP init + tool list request via stdin
echo '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2025-03-26","capabilities":{},"clientInfo":{"name":"test","version":"1.0.0"}}}
{"jsonrpc":"2.0","method":"notifications/initialized"}
{"jsonrpc":"2.0","id":2,"method":"tools/list","params":{}}' | GLIPPY_LICENSE_KEY=your-key node src/index.js 2>/dev/null

Troubleshooting

"License error: No license key configured"

Cause: The GLIPPY_LICENSE_KEY environment variable is not set.

Fix: Add the key to your MCP configuration:

"env": {
  "GLIPPY_LICENSE_KEY": "GLMCP-XXXX-XXXX-XXXX"
}

"License validation failed"

Cause: Invalid or expired license key.

Fix: Get a valid key at glippy.dev.

"Could not reach license server"

Cause: Network connectivity issue or firewall blocking.

Fix:

  • Check your internet connection
  • Ensure glippy-mcp-api.info-8cb.workers.dev is accessible
  • If you have a cached valid license, the server will continue working for 24 hours

"Error analysing domain: HTTP 403/404"

Cause: Target site is blocking requests or page doesn't exist.

Fix:

  • Verify the domain is accessible in a browser
  • Some sites block automated requests — try a different domain
  • Check if the site requires authentication

"No URLs found in sitemap"

Cause: The sitemap doesn't contain <loc> entries or uses an unexpected format.

Fix:

  • Verify the sitemap URL returns valid XML
  • Check that URLs in the sitemap match the expected domain
  • For sitemap indexes, ensure sub-sitemaps are accessible

High memory usage during batch analysis

Cause: Analysing too many URLs at once.

Fix:

  • Use max_urls parameter to limit sitemap crawling
  • Reduce max_pages for domain comparison
  • Process URLs in smaller batches

AI Crawlers Detected

The server checks access rules for these AI crawlers in robots.txt:

| Crawler | Company | Purpose | |---------|---------|---------| | GPTBot | OpenAI | Training data for GPT models | | ChatGPT-User | OpenAI | Real-time browsing in ChatGPT | | Google-Extended | Google | Training data for Bard/Gemini | | ClaudeBot | Anthropic | Training data for Claude | | anthropic-ai | Anthropic | Anthropic's general crawler | | CCBot | Common Crawl | Open web corpus | | PerplexityBot | Perplexity AI | Search and answer engine | | Bytespider | ByteDance | TikTok/Douyin AI features | | AmazonBot | Amazon | Alexa and shopping AI | | cohere-ai | Cohere | Enterprise AI models |


License

See LICENSE file for licensing terms. Get your license key at glippy.dev.


Support


Generated by Glippy — GEO Agent-Readiness Checker