docmunch

v0.3.3

Published

3 months ago

Convert online documentation into AI-ready Markdown context files

0High
0Medium
0Low

charlesgrangerthev

documentation markdown ai context cli llm vibe-coding cursor copilot claude docs scraper turndown readability

docmunch

Convert documentation URLs into clean, AI-ready Markdown files. Drop them into your project so AI coding assistants (Cursor, Claude Code, Copilot, etc.) have accurate, up-to-date context.

Install

# Run directly
npx docmunch <url>

# Or install globally
npm install -g docmunch

Usage

# Fetch a single page to stdout
docmunch https://docs.stripe.com/api/charges

# Write to a file
docmunch https://docs.stripe.com/api/charges -o .ai/stripe.md

# Crawl linked pages (directory output by default)
docmunch https://docs.stripe.com/api/charges --crawl --name stripe

# Crawl with single-file output
docmunch https://docs.stripe.com/api/charges --crawl -o .ai/stripe.md

# Force rewrite even if content unchanged
docmunch https://docs.stripe.com/api/charges --crawl --name stripe --force

# Manage sources in a config file
docmunch add https://docs.stripe.com/api/charges --name stripe --crawl
docmunch update            # refresh all sources
docmunch update --name stripe  # refresh one
docmunch list              # show configured sources

# Browse available docs on the registry
docmunch registry

# Download pre-crawled docs from registry
docmunch pull stripe

Features

Platform detection — auto-detects Mintlify, Docusaurus, GitBook, ReadMe, and falls back to Readability for generic sites
Code block preservation — language tags and indentation survive extraction perfectly
Crawl mode — follows sidebar/nav links with configurable depth, scoped to the documentation path
Smart fetching — static fetch by default, auto-retries with Playwright for blocked sites (403, Cloudflare). Playwright is auto-installed on first need
Token estimation — each page includes an estimated token count in manifests, with source-level totals
Content hashing — SHA-256 hash per page for smart refresh (only re-process changed pages)
Change detection — skips writing files whose content hasn't changed (ignoring timestamps)
Graceful interruption — press Ctrl+C during a crawl to stop and choose whether to save pages collected so far
YAML frontmatter — each output includes source URL, fetch date, platform, and title
Config file — manage multiple doc sources with .docmunch.yaml
MCP server — expose fetched docs to AI tools (Claude Code, Cursor) via Model Context Protocol
Registry pull — download pre-crawled documentation packages from the hosted registry

MCP Server

Once you've crawled documentation, docmunch serve starts an MCP server that lets AI coding tools query your docs directly.

Prerequisite: Install docmunch globally (npm install -g docmunch) or use npx to run it without installing. The setup examples below use npx, which downloads the package automatically if needed.

Quick start

# 1. Crawl some docs
npx docmunch https://docs.stripe.com/api/charges --crawl --name stripe

# 2. Start the MCP server
npx docmunch serve

Claude Code

claude mcp add --scope project docmunch -- npx docmunch serve -d .ai/docs/

That's it. Run /mcp inside Claude Code to verify the server is connected.

Use --scope user instead to make it available across all your projects.

Cursor

Open Cursor Settings (Cmd+, / Ctrl+,) → MCP → + Add new MCP server, then:

Name: docmunch
Type: command
Command: npx docmunch serve -d .ai/docs/

Alternatively, create a .cursor/mcp.json file at your project root:

{
  "mcpServers": {
    "docmunch": {
      "command": "npx",
      "args": ["docmunch", "serve", "-d", ".ai/docs/"]
    }
  }
}

Restart Cursor for the server to be picked up. A green dot next to the server name in Settings → MCP confirms it's running.

VS Code (GitHub Copilot)

Requires the GitHub Copilot extension. Create .vscode/mcp.json at the project root:

{
  "servers": {
    "docmunch": {
      "command": "npx",
      "args": ["docmunch", "serve", "-d", ".ai/docs/"]
    }
  }
}

Windsurf

Open Settings → MCP → Add Server, or create .windsurf/mcp.json:

{
  "mcpServers": {
    "docmunch": {
      "command": "npx",
      "args": ["docmunch", "serve", "-d", ".ai/docs/"]
    }
  }
}

Available tools

Once connected, your AI assistant has access to:

list_sources — see all available documentation sources with metadata
list_pages — list pages within a source
read_page — read the full markdown content of a page, with optional section filtering to save tokens
search_docs — full-text search across all docs with preview excerpts

The read_page tool supports an optional sections parameter — pass an array of heading names to retrieve only those sections instead of the full page. This reduces token usage when you only need specific parts of a doc page.

Options

docmunch serve              # serves .ai/docs/ (default)
docmunch serve -d ./docs/   # custom directory

Registry

Browse available sources

docmunch registry                          # list all sources
docmunch registry --json                   # raw JSON output
docmunch registry --registry-url <url>     # custom registry

Download a source

docmunch pull stripe                          # download to .ai/docs/stripe/
docmunch pull stripe --registry-url <url>     # custom registry
docmunch pull stripe --token <token>          # authenticated access
docmunch pull stripe --force                  # overwrite existing

Environment variables DOCMUNCH_REGISTRY_URL and DOCMUNCH_TOKEN are also supported.

Output Formats

Directory output (crawl mode default)

One .md file per crawled page, with JSON manifests:

.ai/docs/
├── manifest.json              ← root manifest (all sources)
└── stripe/
    ├── _index.json            ← source manifest (pages + metadata)
    ├── charges.md
    └── guides/
        └── authentication.md

Manifests include per-page token_count and content_hash, plus source-level total_tokens.

Single-file output

Used for non-crawl fetches or crawl with -o file.md:

---
source: https://docs.stripe.com/api/charges
fetched_at: 2025-02-08T14:30:00Z
platform: generic
title: Charges
docmunch_version: 0.2.0
---

# Charges

[clean extracted content here]

Config (.docmunch.yaml)

version: 1
output_dir: .ai/docs
sources:
  - name: stripe
    url: https://docs.stripe.com/api/charges
    crawl: true
    max_depth: 2
    output: stripe/
  - name: yousign
    url: https://developers.yousign.com/docs/set-up-your-account
    crawl: false
    output: yousign.md

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

docmunch

Install

Usage

Features

MCP Server

Quick start

Claude Code

Cursor

VS Code (GitHub Copilot)

Windsurf

Available tools

Options

Registry

Browse available sources

Download a source

Output Formats

Directory output (crawl mode default)

Single-file output

Config (.docmunch.yaml)

License