mdrip

v0.1.7

Published

10 days ago

Fetch markdown snapshots of web pages using Cloudflare Markdown for Agents

0High
0Medium
0Low

charlkruger

cli markdown cloudflare agents context

mdrip

Fetch clean markdown snapshots of any web page — optimized for AI agents, RAG pipelines, and context-aware workflows.

Reduces token overhead by ~90% compared to raw HTML while preserving the content structure LLMs need.

Why

AI agents and LLMs work better with markdown than HTML. Feeding raw HTML into a context window wastes tokens on tags, scripts, styles, and boilerplate. mdrip solves this by fetching any URL and returning clean, structured markdown.

~90% fewer tokens than raw HTML
Automatic HTML-to-markdown fallback when native markdown isn't available
Works everywhere — CLI, Node.js, Cloudflare Workers, or via remote MCP
Token-aware — reports estimated token counts so you can manage context budgets

Sites that support Cloudflare's Markdown for Agents return markdown natively at the edge. For all other sites, mdrip's built-in converter handles headings, links, lists, code blocks, tables, blockquotes, and more, while filtering hidden/non-visible content (including hidden attributes, aria-hidden, inline hidden styles, templates/forms, and HTML comments).

Installation

npm install -g mdrip

Or use directly with npx:

npx mdrip <url>

CLI Usage

Fetch pages

# Fetch one page
mdrip https://example.com/docs/getting-started

# Fetch multiple pages
mdrip https://example.com/docs https://example.com/api

# Custom timeout (ms)
mdrip https://example.com --timeout 45000

# Strict mode — only accept native markdown, no HTML fallback
mdrip https://example.com --no-html-fallback

# Raw mode — print markdown to stdout, no file writes
mdrip https://example.com --raw

List fetched pages

mdrip list
mdrip list --json

Remove pages

mdrip remove https://example.com/docs/getting-started

Clean snapshots

# Remove all
mdrip clean

# Remove only one domain
mdrip clean --domain example.com

Raw mode for agent runtimes

--raw prints markdown to stdout and skips all file writes and prompts. Useful for piping content directly into agent loops.

mdrip https://example.com --raw | your-agent-cli

Programmatic API

npm install mdrip

Method reference

| Import path | Method | Returns | Purpose | |---|---|---|---| | mdrip | fetchMarkdown(url, options?) | Promise<MarkdownResponse> | Fetch one URL to markdown with metadata | | mdrip | fetchRawMarkdown(url, options?) | Promise<string> | Fetch one URL to markdown string only | | mdrip/node | fetchMarkdown(url, options?) | Promise<MarkdownResponse> | Node entrypoint alias for in-memory fetch | | mdrip/node | fetchRawMarkdown(url, options?) | Promise<string> | Node entrypoint alias for markdown-only fetch | | mdrip/node | fetchToStore(url, options?) | Promise<FetchResult> | Fetch one URL and persist to mdrip/pages/... | | mdrip/node | fetchManyToStore(urls, options?) | Promise<FetchResult[]> | Fetch many URLs and persist successful results | | mdrip/node | listStoredPages(cwd?) | Promise<PageEntry[]> | List tracked snapshots from mdrip/sources.json |

FetchMarkdownOptions supports: timeoutMs, userAgent, htmlFallback, fetchImpl, tokenModel, tokenEncoding. Default token encoding is o200k_base (recommended for GPT-4o/4.1/5-family style counting). StoreFetchOptions extends that with cwd.

Workers / Edge / In-memory

import { fetchMarkdown } from "mdrip";

const page = await fetchMarkdown("https://example.com/docs");

console.log(page.markdown);       // clean markdown content
console.log(page.markdownTokens); // estimated token count
console.log(page.source);         // "cloudflare-markdown" or "html-fallback"

Node.js (fetch and store to disk)

import { fetchToStore, listStoredPages } from "mdrip/node";

const result = await fetchToStore("https://example.com/docs", {
  cwd: process.cwd(),
});

if (result.success) {
  console.log(`Saved to ${result.path}`);
}

const pages = await listStoredPages(process.cwd());

Remote MCP + HTTP API

mdrip is available as a remote service at mdrip.createmcp.dev with MCP transports and a direct JSON API.

| Endpoint | Transport | Use case | |---|---|---| | /mcp | Streamable HTTP MCP | Recommended for MCP clients | | /sse | SSE MCP | Legacy MCP client compatibility | | /api | JSON over HTTP | Direct non-MCP integration |

MCP tools

fetch_markdown:

Inputs: url (required), timeout_ms (optional), html_fallback (optional)
Output: markdown + metadata (resolvedUrl, status, contentType, source, markdownTokens, contentSignal)

batch_fetch_markdown:

Inputs: urls (required array, 1-10), timeout_ms (optional), html_fallback (optional)
Output: one result per URL, with success/error details

HTTP API (`/api`)

GET /api expects query params:

url (required)
timeout (optional ms)
html_fallback (optional true/false)

curl "https://mdrip.createmcp.dev/api?url=https://example.com&timeout=30000&html_fallback=true"

POST /api supports both single and batch bodies:

{ "url": "https://example.com", "timeout_ms": 30000, "html_fallback": true }

{
  "urls": ["https://example.com", "https://example.com/docs"],
  "timeout_ms": 30000,
  "html_fallback": true
}

Single responses return one fetch result object. Batch responses return { "results": [...] } with success: true|false per URL.

Claude Desktop

Add to claude_desktop_config.json:

{
  "mcpServers": {
    "mdrip": {
      "command": "npx",
      "args": ["mcp-remote", "https://mdrip.createmcp.dev/mcp"]
    }
  }
}

Claude Code

claude mcp add mdrip-remote --transport sse https://mdrip.createmcp.dev/sse

Cloudflare AI Playground

Enter mdrip.createmcp.dev/sse at playground.ai.cloudflare.com.

OpenClaw Integration

Option 1: Dedicated OpenClaw skill (recommended)

Install skills from this repo in your OpenClaw workspace:

cd ~/.openclaw/workspace
npx skills add charl-kruger/mdrip

Enable the OpenClaw-focused skill in ~/.openclaw/openclaw.json:

{
  "skills": {
    "entries": {
      "mdrip-openclaw": {
        "enabled": true
      }
    }
  }
}

If you want OpenClaw to load skills directly from this local repository:

{
  "skills": {
    "load": {
      "extraDirs": ["<absolute-path-to-fetchmd>/skills"]
    }
  }
}

Option 2: Direct CLI usage in OpenClaw workflows

# In-memory markdown only (best for tool pipelines)
mdrip https://example.com/docs --raw

# Persist reusable snapshots in workspace
mdrip https://example.com/docs https://example.com/api
mdrip list --json

File modifications

On first run, mdrip can optionally update:

.gitignore — adds mdrip/
tsconfig.json — excludes mdrip/
AGENTS.md — adds a section pointing agents to your snapshots

Choice is stored in mdrip/settings.json. Use --modify or --modify=false to skip the prompt.

--raw mode bypasses this entirely.

Output structure

mdrip/
├── settings.json
├── sources.json
└── pages/
    └── example.com/
        └── docs/
            └── getting-started/
                └── index.md

Benchmark

Measured on February 13, 2026 (values vary as pages change):

| Page | Mode | Chars saved | Tokens saved | |------|------|------------:|-------------:| | blog.cloudflare.com/markdown-for-agents | cloudflare-markdown | 94.3% | 96.2% | | developers.cloudflare.com/.../markdown-for-agents | cloudflare-markdown | 95.5% | 97.3% | | en.wikipedia.org/wiki/Markdown | html-fallback | 73.8% | 76.4% | | github.com/cloudflare/skills | html-fallback | 96.7% | 98.0% | | Average | | 90.1% | 92.0% |

pnpm build && pnpm benchmark

Token counts use mdrip's tokenizer-based estimator (default encoding: o200k_base).

AI Skills

This repo includes an AI-consumable skills catalog in skills/, following the agentskills format.

mdrip: general-purpose mdrip skill (CLI, package APIs, remote MCP/API)
mdrip-openclaw: OpenClaw-focused skill and config/workflow reference

npx skills add charl-kruger/mdrip

Requirements

Node.js 18+

Author

Charl Kruger

License

Apache-2.0

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

mdrip

Why

Installation

CLI Usage

Fetch pages

List fetched pages

Remove pages

Clean snapshots

Raw mode for agent runtimes

Programmatic API

Method reference

Workers / Edge / In-memory

Node.js (fetch and store to disk)

Remote MCP + HTTP API

MCP tools

HTTP API (/api)

Claude Desktop

Claude Code

Cloudflare AI Playground

OpenClaw Integration

Option 1: Dedicated OpenClaw skill (recommended)

Option 2: Direct CLI usage in OpenClaw workflows

File modifications

Output structure

Benchmark

AI Skills

Requirements

Author

License

HTTP API (`/api`)