glippy-mcp
v0.4.3
Published
MCP server for GEO (Generative Engine Optimization) analysis — check any domain's AI-readiness
Maintainers
Readme
Glippy GEO MCP Server
An MCP (Model Context Protocol) server that exposes Glippy's GEO (Generative Engine Optimization) analysis capabilities as tools for AI agents.
Overview
This MCP server enables AI models (Claude, GPT, etc.) to directly analyse any domain's GEO readiness — how well a website is prepared for AI crawlers, LLM-powered search, and agent interaction.
It wraps the Glippy desktop app's server-side analysis engine (geo-checker.js) and exposes it over the standard MCP protocol via stdio transport.
Key features:
- Full 16-category GEO analysis with weighted scoring
- robots.txt AI crawler access detection
- llms.txt file discovery and parsing
- Agent-readiness discovery - detects emerging agent standards (Content-Signal, llms-full.txt, MCP/A2A/Agent-Skills cards, schemamap, NLWeb, feed discovery)
- Sitemap crawling and multi-page analysis
- Domain comparison and competitive analysis
- Export to styled Markdown or HTML reports
- Smart caching - automatic deduplication of repeated analyses
- JSON output mode - pass analysis results between tools to avoid re-crawling
- Headless Chrome fallback - automatically retries via a real browser when a site blocks bot-shaped fetches (Cloudflare, Akamai, DataDome, etc.)
Table of Contents
- Installation
- Configuration
- Integration Guides
- Tools Reference
- GEO Scoring Categories
- Agent-Readiness Discovery
- Rate Limiting
- Output Formats
- Chrome Rendering Fallback
- Architecture
- Manual Testing
- Troubleshooting
- License
Installation
Via npm (recommended)
npm install -g glippy-mcpVia npx (no install needed)
Use directly via npx in your MCP configuration:
npx -y glippy-mcpRequirements
- Node.js 18.0.0 or higher
- Valid Glippy MCP license key
- Optional: Google Chrome or Chromium installed locally. Only needed if you want the Chrome-rendered fallback to kick in when a target site blocks static fetches. Without Chrome the server still works; it just cannot recover from WAF-blocked pages.
Configuration
License Key
A valid Glippy MCP license key (GLMCP-XXXX-XXXX-XXXX) is required. Get one at glippy.dev.
The server validates the key against the Glippy API on first use and caches the result for 24 hours. Analysis runs locally on your machine — only the license check calls the server.
Usage with Claude Desktop
Add to your claude_desktop_config.json:
{
"mcpServers": {
"glippy-geo": {
"command": "npx",
"args": ["-y", "glippy-mcp"],
"env": {
"GLIPPY_LICENSE_KEY": "GLMCP-XXXX-XXXX-XXXX"
}
}
}
}Config file locations:
- macOS:
~/Library/Application Support/Claude/claude_desktop_config.json - Windows:
%APPDATA%\Claude\claude_desktop_config.json - Linux:
~/.config/Claude/claude_desktop_config.json
Usage with Claude Code
Add to your .mcp.json in your project root or ~/.claude/.mcp.json for global access:
{
"mcpServers": {
"glippy-geo": {
"command": "npx",
"args": ["-y", "glippy-mcp"],
"env": {
"GLIPPY_LICENSE_KEY": "GLMCP-XXXX-XXXX-XXXX"
}
}
}
}Environment Variables
| Variable | Required | Default | Description |
|----------|----------|---------|-------------|
| GLIPPY_LICENSE_KEY | Yes | - | Your MCP license key (GLMCP-XXXX-XXXX-XXXX) |
| GLIPPY_RATE_LIMIT | No | 5 | Default max requests/second per domain for batch tools |
| CHROME_PATH | No | auto-detect | Absolute path to your Chrome/Chromium binary. Overrides the built-in detection list. |
| PUPPETEER_EXECUTABLE_PATH | No | auto-detect | Alternative name for CHROME_PATH, honored for puppeteer-core compatibility. |
| CHROME_REMOTE_URL | No | - | Attach to an already-running Chrome instead of launching a new one. Accepts either http://host:9222 (browserURL) or ws://... (browserWSEndpoint). Start Chrome with --remote-debugging-port=9222. |
| CHROME_HEADLESS | No | new | Set to 0 or false to run Chrome visible. Useful for sites that aggressively detect headless. |
| CHROME_USER_DATA_DIR | No | - | Path to a Chrome user-data directory. Lets the fallback reuse cookies, extensions, and auth state from a dedicated profile. |
Integration Guides
For detailed setup instructions across all supported environments, see the Integration Guide.
Supported Environments
| Environment | Support Level | Config File |
|-------------|---------------|-------------|
| Claude Code (VS Code) | Native MCP | .mcp.json |
| Claude CLI (Terminal) | Native MCP | .mcp.json |
| Claude Desktop | Native MCP | claude_desktop_config.json |
| Cursor IDE | Native MCP | .cursor/mcp.json |
| Windsurf IDE | Native MCP | .windsurf/mcp.json |
| Continue.dev | Native MCP | ~/.continue/config.json |
| ChatGPT / OpenAI | Via bridge/API | Custom integration |
The integration guide includes:
- Step-by-step setup for each environment
- Platform-specific config file locations
- Usage examples and prompts
- Verification and testing instructions
- Troubleshooting tips
Tools Reference
analyze_domain
Run a comprehensive GEO readiness analysis on a domain.
Description: Checks robots.txt, llms.txt, homepage HTML (16 scoring categories), sitemap.xml, and security headers. Returns an overall weighted score (0-100) with per-category breakdowns and actionable recommendations. Use render_mode="auto" to transparently fall back to headless Chrome when a site blocks static fetches (Cloudflare, Akamai, etc.). Use output_format="json" to get raw results that can be passed to export_report.
Parameters:
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| domain | string | Yes | The domain to analyse, e.g. "example.com". Do not include https:// prefix. |
| max_pages | integer | No | Maximum pages to crawl (1-10). Default: 10. |
| render_mode | enum | No | "static" (default) = plain Node fetch, fastest. "auto" = static first, falls back to a local headless Chrome on bot-blocked responses (401/403/407/429/503 or empty 2xx). "chrome" = always render via Chrome. Chrome modes need a local Chrome binary (see Chrome Rendering Fallback). |
| output_format | enum | No | "text" (default) for human-readable report, "json" for raw results to pass to export_report. |
Example:
Analyse GEO readiness for example.comExample (JSON output for chaining):
analyze_domain domain="example.com" max_pages=5 output_format="json"
# Then pass the result to export_reportReturns:
- Overall GEO score (0-100) with letter grade
- Page type detection (article, product, homepage, etc.)
- 16 category scores with pass/fail/warn checks
- robots.txt analysis with AI crawler access
- llms.txt presence and content preview
- Sitemap discovery status
- Multi-page aggregated scores (if
max_pages > 1) renderModeflag on the result:static,chrome-fallback, or an error code if both paths failed
check_robots_txt
Check a domain's robots.txt specifically for AI crawler access rules.
Description: Reports which AI crawlers (GPTBot, ClaudeBot, etc.) are blocked or allowed.
Parameters:
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| domain | string | Yes | The domain to check, e.g. "example.com". Do not include https:// prefix. |
Example:
Check which AI crawlers are blocked on example.comReturns:
- robots.txt existence and URL
- Wildcard disallow detection (
Disallow: /) - Per-crawler access status for:
- GPTBot
- Google-Extended
- CCBot
- anthropic-ai
- ClaudeBot
- Bytespider
- PerplexityBot
- ChatGPT-User
- AmazonBot
- cohere-ai
- Sitemap references found in robots.txt
- Content-Signal directive (
search/ai-input/ai-trainpreferences), when present
check_llms_txt
Check if a domain has an llms.txt file.
Description: Checks for the emerging standard file that provides context to LLMs about a site's purpose and content.
Important: llms.txt is an emerging proposal, but it is not currently supported or consumed by major AI models, crawlers, or MCP clients. No mainstream LLM or AI agent reads llms.txt to inform its behaviour. Having an llms.txt file should not be seen as a relevant optimization for your GEO readiness — it will not meaningfully improve how AI systems discover or understand your site today. That said, it cannot hurt to have one: the file is lightweight, easy to create, and if the standard gains adoption in the future you will already be prepared.
Parameters:
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| domain | string | Yes | The domain to check, e.g. "example.com". Do not include https:// prefix. |
Example:
Does example.com have an llms.txt file?Returns:
- llms.txt existence
- Full file contents if present
- Link to specification at https://llmstxt.org
get_geo_summary
Get a concise GEO readiness summary for quick assessment.
Description: Returns overall score, grade, top 3 strengths, and top 3 issues to fix. Use this for a quick overview; use analyze_domain for full details.
Parameters:
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| domain | string | Yes | The domain to check, e.g. "example.com". Do not include https:// prefix. |
| render_mode | enum | No | "static" (default), "auto" (static with Chrome fallback on bot-block), or "chrome" (always Chrome). See Chrome Rendering Fallback. |
Example:
Give me a quick GEO summary of example.comReturns:
- Overall score and grade
- Page type detected
- Top 3 strongest categories
- Top 3 weakest categories with top issue
- Quick facts (robots.txt, llms.txt, sitemap, blocked crawlers)
compare_domains
Analyse multiple domains in parallel and compare scores.
Description: Returns a comparison table with overall scores, per-category breakdowns, and a ranked summary. Useful for competitive analysis or auditing a portfolio of sites. Use output_format="json" to get raw results that can be passed to export_bulk_report.
Parameters:
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| domains | array[string] | Yes | List of 2-50 domains to compare, e.g. ["example.com", "competitor.com"]. Do not include https:// prefix. For more than 50 domains, split across multiple runs and merge the results. |
| max_pages | integer | No | Maximum pages to crawl per domain (1-10). Default: 10. |
| render_mode | enum | No | "static" (default), "auto" (static with Chrome fallback on bot-block), or "chrome" (always Chrome). See Chrome Rendering Fallback. |
| output_format | enum | No | "text" (default) for comparison table, "json" for raw results to pass to export_bulk_report. |
Example:
Compare GEO scores of example.com, competitor1.com, and competitor2.comReturns:
- Ranked list of domains by score
- Category comparison table (all 16 categories)
- Quick facts comparison (robots.txt, llms.txt, sitemap, blocked crawlers)
- Error details for any failed analyses
analyze_sitemap
Fetch a sitemap and analyse all discovered pages.
Description: Fetches a sitemap XML (or sitemap index), extracts page URLs, and runs GEO analysis on each page. Returns per-page scores, category averages, and identifies weakest pages. Use output_format="json" to get raw results that can be passed to export_bulk_report.
Parameters:
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| sitemap_url | string | Yes | Full URL to sitemap, e.g. "https://example.com/sitemap.xml" |
| max_urls | integer | No | Maximum URLs to analyse (1-50,000). Default: all URLs found. |
| rate_limit | number | No | Max requests/second per domain (0.1-100). Default: 5. |
| render_mode | enum | No | "static" (default), "auto" (static with Chrome fallback on bot-block), or "chrome" (always Chrome). Applied per URL. See Chrome Rendering Fallback. |
| output_format | enum | No | "text" (default) for report, "json" for raw results to pass to export_bulk_report. |
Example:
Analyse all pages in https://example.com/sitemap.xmlReturns:
- Total URLs found vs analysed
- Per-page results table (URL, score, grade, page type)
- Category averages across all pages
- Weakest pages with their problem categories
Supports:
- Regular sitemaps (
<urlset>) - Sitemap index files (
<sitemapindex>) — fetches up to 3 sub-sitemaps
analyze_urls
Run GEO analysis on a list of specific URLs.
Description: Fetches each page, scores it across 10 categories, and returns per-page results with aggregated averages. URLs can span multiple domains. Use output_format="json" to get raw results that can be passed to export_bulk_report.
Parameters:
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| urls | array[string] | Yes | List of 1-50,000 full URLs, e.g. ["https://example.com/about", "https://example.com/pricing"]. Include https:// prefix. |
| rate_limit | number | No | Max requests/second per domain (0.1-100). Default: 5. |
| render_mode | enum | No | "static" (default), "auto" (static with Chrome fallback on bot-block), or "chrome" (always Chrome). Applied per URL. See Chrome Rendering Fallback. |
| output_format | enum | No | "text" (default) for report, "json" for raw results to pass to export_bulk_report. |
Example:
Analyse these specific pages: https://example.com/about, https://example.com/pricing, https://example.com/contactReturns:
- Per-page results table (URL, score, grade, page type)
- Category averages across all pages
- Weakest pages with their problem categories
export_report
Generate a styled, shareable report file.
Description: Runs GEO analysis and returns results as a self-contained report in Markdown or HTML format — matching the Glippy browser extension's export output. You can optionally pass pre-computed analysis results to avoid re-crawling.
Parameters:
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| domain | string | No* | The domain to analyse, e.g. "example.com". Do not include https:// prefix. |
| format | enum | Yes | Report format: "markdown" (recommendations only), "markdown_full" (all categories and checks), or "html" (standalone styled page). |
| max_pages | integer | No | Maximum pages to crawl (1-10). Default: 10. Ignored if analysis_result is provided. |
| render_mode | enum | No | "static" (default), "auto" (Chrome fallback on bot-block), or "chrome" (always Chrome). Ignored if analysis_result is provided. See Chrome Rendering Fallback. |
| analysis_result | object | No* | Pre-computed analysis result from analyze_domain (with output_format="json"). Skips re-crawling. |
*Either domain or analysis_result must be provided.
Example:
Generate an HTML report for example.comExample (using pre-computed results):
# First, analyze with JSON output:
analyze_domain domain="example.com" max_pages=5 output_format="json"
# Then export without re-crawling:
export_report format="html" analysis_result=<result from above>Returns:
- Complete report content ready to save
- For HTML: Standalone page with dark/light theme toggle, score ring, category accordion, recommendations table
- For Markdown: Structured document with priority-sorted recommendations
export_bulk_report
Generate a styled report for bulk analysis.
Description: Creates a comprehensive report for comparing multiple domains, analysing a list of URLs, or crawling a sitemap. Returns a self-contained report with rankings, category breakdowns, and per-domain/page recommendations. You can pass pre-computed results to avoid re-crawling.
Parameters:
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| format | enum | Yes | Report format: "markdown" or "html" |
| domains | array[string] | No* | Compare 2-50 domains. Do not include https://. For more than 50, run multiple times. |
| urls | array[string] | No* | Analyse 1-50,000 specific URLs. Include https://. |
| sitemap_url | string | No* | Crawl a sitemap URL. |
| analysis_results | object | No* | Pre-computed results from compare_domains, analyze_urls, or analyze_sitemap (with output_format="json"). |
| max_pages | integer | No | For domain mode: pages per domain (1-10). Default: 10. Ignored if analysis_results provided. |
| max_urls | integer | No | For sitemap mode: max URLs to analyse. Default: all. Ignored if analysis_results provided. |
| rate_limit | number | No | Max requests/second per domain. Default: 5. Ignored if analysis_results provided. |
| render_mode | enum | No | "static" (default), "auto" (Chrome fallback on bot-block), or "chrome" (always Chrome). Ignored if analysis_results provided. See Chrome Rendering Fallback. |
*Provide exactly one of: domains, urls, sitemap_url, or analysis_results.
Example:
Generate an HTML comparison report for example.com and competitor.comExample (using pre-computed results):
# First, compare with JSON output:
compare_domains domains=["example.com", "competitor.com"] output_format="json"
# Then export without re-crawling:
export_bulk_report format="html" analysis_results=<result from above>Returns:
- Domain comparison: Rankings, category comparison table, quick facts, per-domain recommendations
- URL/Sitemap analysis: Per-page results, category averages, common issues across pages, weakest/strongest pages
GEO Scoring Categories
The analysis evaluates 16 categories, each with a weight reflecting its importance for AI/LLM readiness:
| # | Category | Weight | What It Measures |
|---|----------|--------|------------------|
| 1 | Structured Data & Schema | 1.5x | JSON-LD presence, Schema.org types (FAQPage, Article, Product, etc.), Speakable markup, schema validation |
| 2 | Semantic HTML | 1.2x | Heading hierarchy (H1-H6), semantic elements (<article>, <nav>, <main>), content-to-markup ratio |
| 3 | Accessibility for Agents | 1.0x | Lang attribute, alt text on images, ARIA labels, descriptive link text |
| 4 | Internal Linking | 1.0x | Link density, navigation structure, breadcrumb markup |
| 5 | Meta & Discoverability | 1.0x | Title, meta description, canonical URL, Open Graph tags, hreflang |
| 6 | Machine Readability | 1.5x | SSR detection, bot blocking checks, robots.txt rules, llms.txt presence*, robots.txt Content-Signal directive, llms-full.txt, HTTP Link discovery headers, Markdown source endpoints, RSS/Atom/JSON feed discovery |
| 7 | Entity & Authority | 1.0x | Author info, publication dates, organization schema, E-E-A-T signals, credentials, editorial policy, contact completeness |
| 8 | Citability & Answer-Readiness | 1.3x | FAQ content, data tables, lists, lead paragraph quality |
| 9 | Performance & Crawlability | 0.3x | Image dimensions, lazy loading, resource hints |
| 10 | Agent Interactivity | 0.2x | WebMCP tools, form annotations, agent-callable actions, MCP server card (/.well-known/mcp/server-card.json), A2A agent card, Agent-Skills index, NLWeb endpoint, schemamap |
| 11 | Content Positioning | 1.2x | Brand differentiation, proof points, social proof |
| 12 | Content Freshness | 0.8x | Date signals, content age, temporal language |
| 13 | Information Density | 1.0x | Substantive-to-filler ratio, section depth, claim-evidence pairing |
| 14 | Factual Verifiability | 0.8x | Citations, source attribution, methodology disclosure |
| 15 | Content Comprehensiveness | 0.8x | Word count, heading coverage, definitions, comparisons |
| 16 | Multimodal Content | 0.5x | Image alt text, figures, video/audio, SVG, multimedia schema |
*llms.txt is checked for presence but is not currently supported or consumed by any major AI model or crawler. It has minimal practical impact on GEO readiness today — see the check_llms_txt section for details.
Scoring
- Each category produces a score from 0-100
- The overall score is a weighted average using the weights above
- Scores map to letter grades: A+ (90+), A (80+), B (70+), C (60+), D (40+), F (<40)
Agent-Readiness Discovery
Alongside the established checks, the server probes a set of emerging agent-readiness standards. These surfaces let agents discover and consume a site without scraping HTML.
These checks are bonus-scored: a site gets credit when a surface is present, but absence is reported as informational guidance rather than a penalty. This keeps the long tail of sites that have not adopted these new standards from being unfairly marked down, while still rewarding early adopters.
| Surface | Where it's checked | What it signals |
|---------|--------------------|-----------------|
| Content-Signal | robots.txt directive | Machine-readable AI usage preferences (search / ai-input / ai-train). Only ai-input=no affects AI answer visibility; ai-train=no is treated as a training-only preference with no citation impact. |
| llms-full.txt | /llms-full.txt | Concatenated Markdown corpus of the pages listed in llms.txt, for full-context ingestion. Very large files (>5 MB) are flagged. |
| HTTP Link discovery | response Link header | Resource discovery via headers (rel="describedby", api-catalog, sitemap, mcp, service-desc, nlweb) without parsing HTML. |
| Markdown source endpoint | <link rel="alternate" type="text/markdown"> or content negotiation | A clean .md version of each page for agent ingestion. |
| Feed discovery | <link rel="alternate"> | RSS / Atom / JSON feeds as a machine-readable content stream. |
| MCP server card | /.well-known/mcp/server-card.json | Discoverable MCP server (name, version, transport, endpoint, tools). |
| A2A agent card | /.well-known/agent-card.json | Agent-to-agent discovery with declared skills. |
| Agent-Skills index | /.well-known/agent-skills/index.json | Reusable agent skills exposed with digests. |
| NLWeb endpoint | <link rel="nlweb"> or Link header | Natural-language query endpoint (conventionally /ask). |
| Schemamap | /schemamap.xml or <link rel="schemamap"> | Per-resource JSON-LD (.jsonld) endpoints for agent-friendly structured data. |
Content-Signal, HTTP Link discovery, Markdown source endpoints, llms-full.txt, and feed discovery feed into the Machine Readability category; the MCP/A2A/Agent-Skills cards, NLWeb, and schemamap feed into Agent Interactivity. The raw findings are also returned under an agentReadiness object in output_format="json" results.
Rate Limiting
To prevent overwhelming target servers during batch operations, the MCP server enforces per-domain rate limiting:
Configuration
- Environment variable: Set
GLIPPY_RATE_LIMIT=3for 3 requests/second default - Per-call parameter: Pass
rate_limittoanalyze_sitemap,analyze_urls, orexport_bulk_report
Recommended Values
| Scenario | Rate Limit | Description |
|----------|------------|-------------|
| Polite crawling | 0.5 - 1 | 1 request every 1-2 seconds |
| Default | 5 | 5 requests/second (balanced) |
| Your own server | 10 - 50 | Faster crawling when you control the target |
| Aggressive | 100 | Maximum speed (use with caution) |
How It Works
- Requests to different domains run in parallel
- Requests to the same domain are serialized with the configured delay
- Global concurrency is capped at 10 simultaneous requests
Output Formats
Text (Default)
All tools return structured text output by default, suitable for:
- Inline display in chat
- Quick analysis and follow-up questions
- Programmatic parsing
Markdown Reports
Generated by export_report and export_bulk_report:
- Clean, readable structure
- Priority-sorted recommendations (High → Medium → Low)
- Tables for easy comparison
- Save as
.mdfile
HTML Reports
Generated by export_report and export_bulk_report:
- Standalone, self-contained page (no external dependencies)
- Dark/light theme toggle with system preference detection
- Interactive category accordion
- Score ring visualization
- Copy recommendations button
- Print-friendly styling
- Save as
.htmlfile
Caching & Efficient Workflows
The MCP server includes smart caching and result-passing features to avoid redundant crawling.
Automatic Caching
Analysis results are cached in-memory for 5 minutes with the following behavior:
- Key:
domain + maxPages— cached results are reused when the same domain is analyzed again - Smart coverage: If you request
max_pages=3and there's a cached result withmax_pages=5, the cache is used - Automatic: No configuration needed — just call tools normally and caching happens automatically
Example workflow (automatic):
# First call — crawls the site
analyze_domain domain="example.com" max_pages=5
# Second call within 5 minutes — uses cached result
export_report domain="example.com" format="html"JSON Output Mode
For explicit control, use output_format="json" to get raw analysis results that can be passed to export tools.
Single domain workflow:
# Step 1: Analyze with JSON output
analyze_domain domain="example.com" max_pages=5 output_format="json"
# Returns full analysis object as JSON
# Step 2: Export multiple formats without re-crawling
export_report format="html" analysis_result=<JSON from step 1>
export_report format="markdown_full" analysis_result=<JSON from step 1>Multi-domain workflow:
# Step 1: Compare with JSON output
compare_domains domains=["site1.com", "site2.com"] output_format="json"
# Returns array of analysis results
# Step 2: Generate report without re-crawling
export_bulk_report format="html" analysis_results=<JSON from step 1>Sitemap/URL workflow:
# Step 1: Analyze sitemap with JSON output
analyze_sitemap sitemap_url="https://example.com/sitemap.xml" output_format="json"
# Returns { sitemap_url, pageResults, aggregated }
# Step 2: Generate report without re-crawling
export_bulk_report format="html" analysis_results=<JSON from step 1>When to Use Each Approach
| Scenario | Recommended Approach | |----------|---------------------| | Quick analysis + single export | Automatic caching (just call both tools) | | Generate multiple report formats | JSON output mode (analyze once, export many) | | Time-sensitive workflow | JSON output mode (guaranteed no re-crawling) | | Interactive exploration | Automatic caching (ask questions, then export) |
Chrome Rendering Fallback
Some sites (Cloudflare, Akamai, PerimeterX, DataDome, Incapsula) refuse static Node fetches with 401/403/429/503 responses. The server can drive a real Chrome instance to fetch those pages instead, so they still get scored.
Choosing a render mode
Every analysis tool (analyze_domain, get_geo_summary, compare_domains, analyze_urls, analyze_sitemap, export_report, export_bulk_report) accepts a render_mode parameter:
| Mode | Behavior | Use when |
|------|----------|----------|
| static (default) | Plain Node fetch. Fast. No Chrome required. | You're scoring sites that don't block bots, or you explicitly want to see how a static crawler experiences the page. |
| auto | Static fetch first. If it looks bot-blocked (status 401/403/407/429/503, or 2xx with an empty body), retry that URL via Chrome. | Mixed workloads - most sites fast-path through static; only blocked ones pay the Chrome cost. Recommended for competitive audits across a list of domains. |
| chrome | Every URL fetched via Chrome. Slowest, most resilient. | You know the targets aggressively detect headless and want to front-load the Chrome cost, or you're debugging rendering differences. |
The result object includes a renderMode field so you can tell which path ran: static, chrome, chrome-fallback, chrome-blocked-<code> (Chrome tried but also got blocked), or static-blocked (both paths failed).
Setup
Chrome modes need a Chrome or Chromium binary. The server looks in these locations, in order:
CHROME_PATHenv varPUPPETEER_EXECUTABLE_PATHenv varC:/Program Files/Google/Chrome/Application/chrome.exeC:/Program Files (x86)/Google/Chrome/Application/chrome.exe/Applications/Google Chrome.app/Contents/MacOS/Google Chrome/usr/bin/google-chrome,/usr/bin/chromium,/usr/bin/chromium-browser
If none exist, render_mode: "static" still works; only the Chrome-backed modes become unavailable.
Attaching to your own Chrome
For sites that fingerprint headless Chrome, start a Chrome instance with remote debugging and point the server at it. The server will attach to that instance instead of launching its own:
# macOS
/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome \
--remote-debugging-port=9222 --user-data-dir=/tmp/glippy-chrome
# Windows (PowerShell)
& "C:\Program Files\Google\Chrome\Application\chrome.exe" `
--remote-debugging-port=9222 --user-data-dir=C:\Temp\glippy-chrome
# Then in your MCP config env:
# CHROME_REMOTE_URL=http://127.0.0.1:9222Using a dedicated --user-data-dir keeps this session isolated from your normal browsing. When attached, the fetcher leaves UA/headers/stealth untouched so requests look identical to a human using that browser.
Visible mode
For debugging, set CHROME_HEADLESS=0 to watch Chrome drive itself. Purely for development - leave it off in production.
Architecture
research-mcp/
├── src/
│ ├── index.js # MCP server - tool registration, JSON-RPC handling, license validation
│ ├── geo-checker.js # GEO analysis engine - fetches & scores domains
│ └── chrome-fetcher.js # Headless Chrome adapter (puppeteer-core) for WAF-blocked sites
├── package.json
└── README.mdAnalysis Flow
Fetch resources in parallel:
- robots.txt
- llms.txt
- Homepage HTML (static fetch first, Chrome fallback if bot-blocked)
- sitemap.xml
- UCP profile (/.well-known/ucp)
- Agent-readiness discovery surfaces: /llms-full.txt, /.well-known/mcp/server-card.json, /.well-known/agent-card.json, /.well-known/agent-skills/index.json, /schemamap.xml
Parse HTML with cheerio (server-side DOM)
Run 16 weighted scoring categories
Return comprehensive analysis with actionable recommendations
Protocol
- Transport: stdio (JSON-RPC 2.0 over stdin/stdout)
- SDK:
@modelcontextprotocol/sdk(official TypeScript MCP SDK) - Logging: All logs go to stderr (stdout reserved for MCP protocol)
Manual Testing
Test the MCP server directly via command line:
# Send MCP init + tool list request via stdin
echo '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2025-03-26","capabilities":{},"clientInfo":{"name":"test","version":"1.0.0"}}}
{"jsonrpc":"2.0","method":"notifications/initialized"}
{"jsonrpc":"2.0","id":2,"method":"tools/list","params":{}}' | GLIPPY_LICENSE_KEY=your-key node src/index.js 2>/dev/nullTroubleshooting
"License error: No license key configured"
Cause: The GLIPPY_LICENSE_KEY environment variable is not set.
Fix: Add the key to your MCP configuration:
"env": {
"GLIPPY_LICENSE_KEY": "GLMCP-XXXX-XXXX-XXXX"
}"License validation failed"
Cause: Invalid or expired license key.
Fix: Get a valid key at glippy.dev.
"Could not reach license server"
Cause: Network connectivity issue or firewall blocking.
Fix:
- Check your internet connection
- Ensure
glippy-mcp-api.info-8cb.workers.devis accessible - If you have a cached valid license, the server will continue working for 24 hours
"Error analysing domain: HTTP 403/404"
Cause: Target site is blocking requests or page doesn't exist.
Fix:
- Verify the domain is accessible in a browser
- Some sites block automated requests — try a different domain
- Check if the site requires authentication
"No URLs found in sitemap"
Cause: The sitemap doesn't contain <loc> entries or uses an unexpected format.
Fix:
- Verify the sitemap URL returns valid XML
- Check that URLs in the sitemap match the expected domain
- For sitemap indexes, ensure sub-sitemaps are accessible
High memory usage during batch analysis
Cause: Analysing too many URLs at once.
Fix:
- Use
max_urlsparameter to limit sitemap crawling - Reduce
max_pagesfor domain comparison - Process URLs in smaller batches
AI Crawlers Detected
The server checks access rules for these AI crawlers in robots.txt:
| Crawler | Company | Purpose | |---------|---------|---------| | GPTBot | OpenAI | Training data for GPT models | | ChatGPT-User | OpenAI | Real-time browsing in ChatGPT | | Google-Extended | Google | Training data for Bard/Gemini | | ClaudeBot | Anthropic | Training data for Claude | | anthropic-ai | Anthropic | Anthropic's general crawler | | CCBot | Common Crawl | Open web corpus | | PerplexityBot | Perplexity AI | Search and answer engine | | Bytespider | ByteDance | TikTok/Douyin AI features | | AmazonBot | Amazon | Alexa and shopping AI | | cohere-ai | Cohere | Enterprise AI models |
License
See LICENSE file for licensing terms. Get your license key at glippy.dev.
Support
- Integration Guide: docs/INTEGRATIONS.md
- Online Documentation: glippy.dev/docs
- Issues: github.com/jbobbink/glippy/issues
- Homepage: glippy.dev
Generated by Glippy — GEO Agent-Readiness Checker
