@arvoretech/docusaurus-to-md
v0.1.0
Published
Scrape Docusaurus sites and convert to clean Markdown files
Downloads
104
Readme
docusaurus-to-md
Scrape Docusaurus sites and convert all pages to clean Markdown files. Built for generating LLM-ready documentation.
Quick start
npx docusaurus-to-md https://docs.example.comInstall
npm install -g docusaurus-to-mdCLI
docusaurus-to-md https://docs.example.com
docusaurus-to-md https://docs.example.com -p /docs/api/ -o ./api-docs
docusaurus-to-md https://docs.example.com -w 16
docusaurus-to-md https://docs.example.com --no-single-fileAPI
import { scrape } from "docusaurus-to-md";
const result = await scrape({
baseUrl: "https://docs.example.com",
pathPrefix: "/docs/",
outputDir: "./output",
workers: 8,
});
console.log(`${result.pages.length} pages scraped`);How it works
- Fetches
sitemap.xmlfrom the Docusaurus site - Filters URLs by path prefix
- Scrapes pages in parallel batches
- Extracts main content (strips nav, footer, sidebar, TOC)
- Converts HTML to Markdown via Turndown
- Saves individual
.mdfiles + optional combined_all.md
