@wave-rf/cloudflare-md-router
v0.2.1
Published
Cloudflare Worker that serves a `.md` twin of any static page when the request is from a known LLM crawler or explicitly asks for `text/markdown`. Falls back to the HTML response otherwise.
Downloads
102
Maintainers
Readme
cloudflare-md-router
A tiny Cloudflare Worker that serves the .md twin of any static page when the request is from a known LLM crawler or explicitly asks for text/markdown. Falls back to the HTML response when the .md twin doesn't exist.
If you're building a docs site that already emits a per-page raw-markdown twin (e.g. /foo/bar and /foo/bar.md), this lets every page do content negotiation transparently — Claude, ChatGPT, Perplexity, etc. fetch the model-friendly version automatically; humans keep getting the styled HTML page.
Behavior
| Request | Worker serves |
| ---------------------------------------------- | ------------------- |
| Anything with a file extension (.css, .png, .md, …) | Pass-through to ASSETS |
| Non-GET | Pass-through to ASSETS |
| Accept: text/markdown | <path>.md (HTML fallback on 404) |
| User-Agent matches a known LLM bot | <path>.md (HTML fallback on 404) |
| Everything else | HTML page — plus a Link header advertising its .md twin |
On the normal HTML page response, the worker also adds an RFC 8288 Link header so an agent can discover the markdown twin from a plain GET, without sniffing the User-Agent or guessing the right Accept:
Link: </foo/bar.md>; rel="alternate"; type="text/markdown"This is on by default (only for a 200 text/html reply to an extension-less GET); disable it with advertiseTwin: false.
Because the worker negotiates on the Accept header, two clients can get different representations of the same URL. If you put a shared cache (a CDN, Cloudflare's own cache) in front of it, set vary: true to add a Vary: Accept header to the negotiated responses so the cache keys on Accept and doesn't serve the HTML page to a client that asked for markdown — or vice-versa. It's off by default (verbatim pass-through stays byte-for-byte; Vary: User-Agent is intentionally not added, as it would defeat shared caching).
The included bot list covers the common ones: GPTBot, ChatGPT-User, OAI-SearchBot, ClaudeBot, Claude-Web, anthropic-ai, PerplexityBot, CCBot, Applebot-Extended, Google-Extended, cohere-ai, Bytespider, Diffbot. See src/bots.ts.
Install
pnpm add @wave-rf/cloudflare-md-routerThis package ships raw TypeScript with no build step — bundle it with Wrangler/esbuild (the Cloudflare Workers default), which resolve the .ts entry points directly.
Use
The simplest setup — re-export the default handler from your worker entrypoint:
// worker/index.ts
export { default } from "@wave-rf/cloudflare-md-router/worker";Configure your wrangler.jsonc with an ASSETS binding pointing at your built static site:
{
"name": "my-docs",
"main": "worker/index.ts",
"compatibility_date": "2025-01-01",
"assets": {
"directory": "./dist",
"binding": "ASSETS",
"not_found_handling": "404-page",
"html_handling": "drop-trailing-slash",
"run_worker_first": true
}
}run_worker_first is required so the worker sees the request before Cloudflare's static-asset matcher does — otherwise the worker only ever runs on 404s.
Customizing
Use createMdRouter() if you need to extend the bot list, change the .md path mapping, or add other Accept tokens:
// worker/index.ts
import { createMdRouter, LLM_BOT_UA } from "@wave-rf/cloudflare-md-router";
export default createMdRouter({
// Add your own bots:
botUserAgents: new RegExp(LLM_BOT_UA.source + "|mybot", "i"),
// Treat `Accept: text/x-markdown` as markdown too:
acceptMarkdown: ["text/x-markdown"],
// Custom .md path strategy. Default: `/foo/` → `/foo.md`, `/` → `/index.md`.
mdPathFor: (pathname) => `/markdown${pathname.replace(/\/$/, "")}.md`,
// Don't advertise the `.md` twin via a `Link` header (default: true).
advertiseTwin: false,
// Add `Vary: Accept` to negotiated responses so a shared cache doesn't
// cross-serve the HTML and markdown representations (default: false).
vary: true,
});Why content-negotiate?
Most LLMs do better with raw markdown than with rendered HTML — less DOM noise, no Starlight nav chrome, no script tags. Serving the same content at one URL with two representations means:
- One canonical URL per page (good for citations and link-sharing).
- Crawlers and human readers stay aligned automatically.
- Your
llms.txtcan advertise<page>.mdfor explicit fetches; the worker covers the case where the LLM hits the HTML URL anyway.
License
MIT — see LICENSE.
