npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

docusaurus-scraper

v1.2.0

Published

A command-line tool to extract documentation from Docusaurus sites and alternatives like Mintlify, converting them to Markdown format

Readme

Documentation Scraper

A command-line tool to extract documentation from Docusaurus sites and alternatives like Mintlify, converting them to Markdown format.

Motivation

When working with AI-powered code completion and generation tools, you often need to provide context about third-party dependencies and their APIs. These LLM agents perform better when they have access to comprehensive documentation in a structured format.

This tool solves that problem by:

  • Extracting complete documentation from multiple documentation platforms
  • Converting to Markdown format that's optimal for LLM consumption
  • Preserving structure and formatting including code blocks, links, and navigation
  • Providing context-rich output that helps AI agents understand how to use libraries and APIs
  • Supporting multiple platforms including Docusaurus, Mintlify, and others

Perfect for feeding documentation to AI coding assistants, building knowledge bases, or creating offline documentation archives.

Installation

Global Installation

npm install -g docusaurus-scraper

Local Installation

npm install docusaurus-scraper

Usage

Command Line Interface

# Basic usage with auto-detection
docusaurus-scraper https://docs.example.com

# Specify documentation platform
docusaurus-scraper https://docs.example.com --platform docusaurus
docusaurus-scraper https://docs.managexr.com --platform mintlify

# Specify output file
docusaurus-scraper https://docs.example.com -o my-docs.md

# Run with visible browser (useful for debugging)
docusaurus-scraper https://docs.example.com --no-headless

# Custom timeout and delay
docusaurus-scraper https://docs.example.com -t 15000 -d 1000

# Skip metadata in output
docusaurus-scraper https://docs.example.com --no-metadata

# Combine multiple options
docusaurus-scraper https://docs.example.com -o output.md -t 20000 --platform mintlify --no-headless

Supported Platforms

  • Docusaurus (--platform docusaurus): The original React-based documentation platform
  • Mintlify (--platform mintlify): Modern documentation platform with beautiful UX
  • Auto-detection (--platform auto, default): Automatically detects the platform type

Programmatic Usage

The package supports both CommonJS and ES Modules, and can be used with TypeScript:

CommonJS

const { DocumentationScraper } = require('docusaurus-scraper');

const scraper = new DocumentationScraper({
  headless: true,
  timeout: 10000,
  delay: 500,
  includeMetadata: true,
  platform: 'auto', // or 'docusaurus', 'mintlify'
});

const markdown = await scraper.scrape('https://docs.example.com', 'output.md');
console.log('Documentation extracted successfully!');

ES Modules

import { DocumentationScraper } from 'docusaurus-scraper';

const scraper = new DocumentationScraper({
  headless: true,
  timeout: 10000,
  delay: 500,
  includeMetadata: true,
  platform: 'mintlify', // Specify platform explicitly
});

const markdown = await scraper.scrape('https://docs.example.com', 'output.md');
console.log('Documentation extracted successfully!');

TypeScript

import { DocumentationScraper, ScraperOptions } from 'docusaurus-scraper';

const options: ScraperOptions = {
  headless: true,
  timeout: 10000,
  delay: 500,
  includeMetadata: true,
  platform: 'auto', // Platform can be specified here
};

const scraper = new DocumentationScraper(options);

try {
  await scraper.scrape('https://docs.example.com', 'output.md');
  console.log('Documentation extracted successfully!');
} catch (error) {
  console.error('Extraction failed:', error);
}

CLI Options

| Option | Description | Default | | ----------------------- | ------------------------------------------ | --------------------- | | -o, --output <file> | Output markdown file | docs-{timestamp}.md | | --no-headless | Run browser in visible mode | false (headless) | | -t, --timeout <ms> | Page timeout in milliseconds | 10000 | | -d, --delay <ms> | Delay between requests | 500 | | --no-metadata | Skip metadata in output | false (include) | | -p, --platform <type> | Platform type (docusaurus, mintlify, auto) | auto |

Configuration Options

When using programmatically, you can pass these options to the constructor:

const scraper = new DocumentationScraper({
  headless: true, // Run browser in headless mode
  timeout: 10000, // Page load timeout
  delay: 500, // Delay between page requests
  includeMetadata: true, // Include metadata in output
  customSelectors: [], // Additional CSS selectors for content discovery
  platform: 'auto', // Platform type: 'docusaurus', 'mintlify', 'auto'
});

Output Format

The tool generates clean Markdown with:

  • Hierarchical structure with proper heading levels
  • Preserved code blocks with syntax highlighting
  • Working links and references
  • Metadata section (optional) with source URL and extraction date
  • Page-by-page organization with clear separators

Example output structure:

# Documentation from: https://docs.example.com

Platform: docusaurus
Date: 2025-01-19T10:30:00.000Z

---

## Getting Started | Example Docs

**URL:** [https://docs.example.com/getting-started](https://docs.example.com/getting-started)

Content here...

---

## API Reference | Example Docs

**URL:** [https://docs.example.com/api](https://docs.example.com/api)

More content...

Features

  • Multi-Platform Support - Works with Docusaurus, Mintlify, and other documentation platforms
  • Auto-Detection - Automatically detects the documentation platform type
  • Automatic URL Discovery - Finds documentation pages via sitemap or navigation crawling
  • Content Extraction - Intelligently identifies main content areas using platform-specific selectors
  • Markdown Conversion - Clean conversion preserving formatting
  • Code Block Preservation - Maintains syntax highlighting information
  • Link Resolution - Preserves internal and external links
  • Configurable Output - Customizable formatting and metadata options
  • Error Handling - Graceful handling of failed pages or network issues

Requirements

  • Node.js >= 14.0.0
  • Internet connection for accessing target documentation sites

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Development

  1. Clone the repository
  2. Install dependencies: npm install
  3. Run linting: npm run lint
  4. Format code: npm run format

Publishing

This project uses release-please for automated versioning and publishing. When you merge changes to the main branch, a release PR will be automatically created. Merging that PR will trigger a new release.

Dependencies

License

This project is licensed under the MIT License - see the LICENSE file for details.