@wave-rf/cloudflare-md-router

v0.2.1

Published

11 days ago

Cloudflare Worker that serves a `.md` twin of any static page when the request is from a known LLM crawler or explicitly asks for `text/markdown`. Falls back to the HTML response otherwise.

Downloads

102

0High
0Medium
0Low

ericandrechek

cloudflare cloudflare-workers llm llms-txt markdown content-negotiation static-site

cloudflare-md-router

A tiny Cloudflare Worker that serves the .md twin of any static page when the request is from a known LLM crawler or explicitly asks for text/markdown. Falls back to the HTML response when the .md twin doesn't exist.

If you're building a docs site that already emits a per-page raw-markdown twin (e.g. /foo/bar and /foo/bar.md), this lets every page do content negotiation transparently — Claude, ChatGPT, Perplexity, etc. fetch the model-friendly version automatically; humans keep getting the styled HTML page.

Behavior

| Request | Worker serves | | ---------------------------------------------- | ------------------- | | Anything with a file extension (.css, .png, .md, …) | Pass-through to ASSETS | | Non-GET | Pass-through to ASSETS | | Accept: text/markdown | <path>.md (HTML fallback on 404) | | User-Agent matches a known LLM bot | <path>.md (HTML fallback on 404) | | Everything else | HTML page — plus a Link header advertising its .md twin |

On the normal HTML page response, the worker also adds an RFC 8288 Link header so an agent can discover the markdown twin from a plain GET, without sniffing the User-Agent or guessing the right Accept:

Link: </foo/bar.md>; rel="alternate"; type="text/markdown"

This is on by default (only for a 200 text/html reply to an extension-less GET); disable it with advertiseTwin: false.

Because the worker negotiates on the Accept header, two clients can get different representations of the same URL. If you put a shared cache (a CDN, Cloudflare's own cache) in front of it, set vary: true to add a Vary: Accept header to the negotiated responses so the cache keys on Accept and doesn't serve the HTML page to a client that asked for markdown — or vice-versa. It's off by default (verbatim pass-through stays byte-for-byte; Vary: User-Agent is intentionally not added, as it would defeat shared caching).

The included bot list covers the common ones: GPTBot, ChatGPT-User, OAI-SearchBot, ClaudeBot, Claude-Web, anthropic-ai, PerplexityBot, CCBot, Applebot-Extended, Google-Extended, cohere-ai, Bytespider, Diffbot. See src/bots.ts.

Install

pnpm add @wave-rf/cloudflare-md-router

This package ships raw TypeScript with no build step — bundle it with Wrangler/esbuild (the Cloudflare Workers default), which resolve the .ts entry points directly.

Use

The simplest setup — re-export the default handler from your worker entrypoint:

// worker/index.ts
export { default } from "@wave-rf/cloudflare-md-router/worker";

Configure your wrangler.jsonc with an ASSETS binding pointing at your built static site:

{
  "name": "my-docs",
  "main": "worker/index.ts",
  "compatibility_date": "2025-01-01",
  "assets": {
    "directory": "./dist",
    "binding": "ASSETS",
    "not_found_handling": "404-page",
    "html_handling": "drop-trailing-slash",
    "run_worker_first": true
  }
}

run_worker_first is required so the worker sees the request before Cloudflare's static-asset matcher does — otherwise the worker only ever runs on 404s.

Customizing

Use createMdRouter() if you need to extend the bot list, change the .md path mapping, or add other Accept tokens:

// worker/index.ts
import { createMdRouter, LLM_BOT_UA } from "@wave-rf/cloudflare-md-router";

export default createMdRouter({
  // Add your own bots:
  botUserAgents: new RegExp(LLM_BOT_UA.source + "|mybot", "i"),

  // Treat `Accept: text/x-markdown` as markdown too:
  acceptMarkdown: ["text/x-markdown"],

  // Custom .md path strategy. Default: `/foo/` → `/foo.md`, `/` → `/index.md`.
  mdPathFor: (pathname) => `/markdown${pathname.replace(/\/$/, "")}.md`,

  // Don't advertise the `.md` twin via a `Link` header (default: true).
  advertiseTwin: false,

  // Add `Vary: Accept` to negotiated responses so a shared cache doesn't
  // cross-serve the HTML and markdown representations (default: false).
  vary: true,
});

Why content-negotiate?

Most LLMs do better with raw markdown than with rendered HTML — less DOM noise, no Starlight nav chrome, no script tags. Serving the same content at one URL with two representations means:

One canonical URL per page (good for citations and link-sharing).
Crawlers and human readers stay aligned automatically.
Your llms.txt can advertise <page>.md for explicit fetches; the worker covers the case where the LLM hits the HTML URL anyway.

License

MIT — see LICENSE.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme