@ebowwa/markdown-docs-scraper
v1.0.0
Published
Scrape and mirror markdown-based documentation sites
Maintainers
Readme
@ebowwa/markdown-docs-scraper
Scrape and mirror markdown-based documentation sites
Features
- 📥 Download full markdown documentation
- 🔄 Organize into directory structure
- 📊 Track downloads and failures
- 🚀 Fast concurrent downloads
- 🎯 CLI and programmatic API
Installation
bun add @ebowwa/markdown-docs-scraper
# or
npm install @ebowwa/markdown-docs-scraperCLI Usage
Quick Start - Anthropic Docs
markdown-docs-scraper anthropic -o ./docsScrape Any Site
markdown-docs-scraper scrape -u https://docs.example.com -o ./docsDiscover Available Pages
markdown-docs-scraper discover -u https://code.claude.comOptions
Commands:
scrape Scrape documentation from a URL
discover Discover all available documentation pages
anthropic Quick scrape of Anthropic Claude Code docs
help Display help for command
Options:
-u, --url <url> Base URL of the documentation site
-o, --output <dir> Output directory (default: "./docs")
--docs-path <path> Docs path (default: "/docs/en")
-c, --concurrency <num> Concurrency level (default: "5")Programmatic Usage
import { MarkdownDocsScraper } from "@ebowwa/markdown-docs-scraper";
const scraper = new MarkdownDocsScraper({
baseUrl: "https://code.claude.com",
docsPath: "/docs/en",
categories: {
"getting-started": ["introduction", "installation", "quick-start"],
features: ["inline-edits", "tool-use", "file-operations"],
},
outputDir: "./docs",
concurrency: 5,
});
const result = await scraper.scrape();
console.log(`Downloaded: ${result.downloaded.length}`);
console.log(`Failed: ${result.failed.length}`);
// Save pages to disk
await scraper.savePages(result.downloaded);Convenience Function
import { scrapeMarkdownDocs } from "@ebowwa/markdown-docs-scraper";
const result = await scrapeMarkdownDocs({
baseUrl: "https://docs.example.com",
outputDir: "./docs",
});API
MarkdownDocsScraper
Constructor Options
interface ScraperOptions {
baseUrl: string; // Base URL of the documentation site
docsPath?: string; // Docs path (default: "/docs/en")
categories?: Record<string, string[]>; // Categories and pages
outputDir?: string; // Output directory (default: "./docs")
concurrency?: number; // Concurrent downloads (default: 5)
onProgress?: (current: number, total: number) => void;
}Methods
scrape()- Scrape all configured pagesfetchMarkdown(url)- Fetch markdown from a URLdownloadPage(category, page)- Download a single pagesavePages(pages)- Save pages to diskdiscoverPages()- Discover available pages
Result
interface ScraperResult {
downloaded: DocPage[]; // Successfully downloaded pages
failed: Array<{ url: string; error: string }>;
duration: number; // Duration in milliseconds
}Output Format
Each downloaded file includes a header comment:
<!--
Source: https://code.claude.com/docs/en/introduction.md
Downloaded: 2026-02-06T00:00:00.000Z
-->
# Introduction
Original markdown content...License
MIT
Contributing
This package is part of the codespaces monorepo.
