npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

doc-fetch-cli

v2.5.0

Published

Dynamic documentation fetching CLI that converts entire documentation sites to single markdown files for AI/LLM consumption

Readme

DocFetch - Dynamic Documentation Fetcher 📚

Transform entire documentation sites into AI-ready, single-file markdown with intelligent LLM.txt indexing

🌐 Website: docfetch.dev

Most AIs can't navigate documentation like humans do. They can't scroll through sections, click sidebar links, or explore related pages. DocFetch solves this fundamental problem by converting entire documentation sites into comprehensive, clean markdown files that contain every section and piece of information in a format that LLMs love.

🚀 Why DocFetch is Essential for AI Development

🤖 AI/LLM Optimization

  • Single-file consumption: No more fragmented context across multiple pages
  • Clean, structured markdown: Perfect token efficiency for LLM context windows
  • Intelligent LLM.txt generation: AI-friendly index with semantic categorization
  • Noise removal: Automatically strips navigation, headers, footers, ads, and buttons

Developer Productivity

  • One command automation: Replace hours of manual copy-pasting with a single CLI command
  • Complete documentation access: Give your AI agents full access to official documentation
  • Consistent formatting: Uniform structure across different documentation sites
  • Version control friendly: Markdown files work perfectly with Git

🎯 Smart Content Intelligence

  • Automatic page classification: Identifies APIs, guides, references, and examples
  • Semantic descriptions: Generates concise, relevant descriptions for each section
  • URL preservation: Maintains original source links for verification
  • Adaptive content extraction: Works with diverse documentation site structures

🔧 Production Ready

  • Concurrent fetching: Fast downloads with configurable concurrency
  • Respectful crawling: Honors robots.txt and includes rate limiting
  • Cross-platform: Works on Windows, macOS, and Linux
  • Multiple installation options: NPM, Go install, or direct binary download

📦 Installation

PyPI (Recommended for Python developers) ✨ NEW

pip install doc-fetch

NPM (Recommended for JavaScript/Node.js developers)

npm install -g doc-fetch

Go (For Go developers)

go install github.com/AlphaTechini/doc-fetch/cmd/docfetch@latest

Direct Binary Download

Visit Releases and download your platform's binary.

🎯 Usage

Basic Usage

# Fetch entire documentation site to single markdown file
doc-fetch --url https://golang.org/doc/ --output ./docs/golang-full.md

# With LLM.txt generation for AI optimization
doc-fetch --url https://react.dev/learn --output docs.md --llm-txt

Advanced Usage

# Comprehensive documentation fetch with all features
doc-fetch \
  --url https://docs.example.com \
  --output ./internal/docs.md \
  --depth 4 \
  --concurrent 10 \
  --llm-txt \
  --user-agent "MyBot/1.0"

Command Options

| Flag | Description | Default | |------|-------------|---------| | --url | Base URL to fetch documentation from | Required | | --output | Output file path | docs.md | | --depth | Maximum crawl depth | 2 | | --concurrent | Number of concurrent workers | 5 | | --llm-txt | Generate AI-friendly llm.txt index | false | | --user-agent | Custom user agent string | DocFetch/1.0 |

Note: Short flags (e.g., -c, -d) have been removed for clarity. Use full flag names only.

⚡ Advanced Tips for Large Documentation Sites

Faster Scraping for Large Sites (1000+ pages)

# Increase concurrency for faster crawling
doc-fetch --url https://docs.example.com --output docs.md --concurrent 15

# Reduce depth if you only need top-level pages
doc-fetch --url https://docs.example.com --output docs.md --depth 2 --concurrent 20

# For massive sites, use multiple passes with different starting URLs
doc-fetch --url https://docs.example.com/guide --output guide.md --depth 3 --concurrent 10
doc-fetch --url https://docs.example.com/api --output api.md --depth 3 --concurrent 10

Recommended Settings by Site Size

| Site Size | Pages | Concurrency | Depth | Time Estimate | |-----------|-------|-------------|-------|---------------| | Small | <100 | 5 | 3 | ~30 seconds | | Medium | 100-500 | 10 | 3 | ~2 minutes | | Large | 500-2000 | 15 | 4 | ~5-10 minutes | | Very Large | 2000+ | 20 | 4 | ~15-30 minutes |

Troubleshooting

"Queue full" warnings: Increase buffer size by using higher concurrency (--concurrent 15)

Slow initial crawl: Normal - speed increases as more workers find pages

Missing pages: Increase depth (--depth 4) or start from multiple entry points

Rate limiting: Add delay between requests or reduce concurrency

Best Practices

  1. Start with conservative settings (--concurrent 5, --depth 2)
  2. Monitor output for missing sections
  3. Adjust based on site structure (some sites have deeper nav trees)
  4. Use --llm-txt for AI agent consumption (generates link index)
  5. Respect robots.txt - DocFetch honors it automatically

📁 Output Files

When using --llm-txt, DocFetch generates two files:

docs.md - Complete Documentation

# Documentation

This file contains documentation fetched by DocFetch.

---

## Getting Started

This guide covers installation, setup, and first program...

---

## Language Specification

Complete Go language specification and syntax...

docs.llm.txt - Link Index (v2.0.7+ Format)

# llm.txt
# Link index with descriptions

Getting Started: https://golang.org/doc/install
Language Specification: https://golang.org/ref/spec
net/http Package: https://pkg.go.dev/net/http
Installing Go: https://golang.org/doc/install
Writing Your First Program: https://golang.org/doc/tutorial/create-module

NEW in v2.0.7: Simplified format extracts link text + URL for easy AI parsing. No more verbose descriptions - just clean "Description: URL" format.

🌟 Real-World Examples

Fetch Go Documentation

doc-fetch --url https://golang.org/doc/ --output ./docs/go-documentation.md --depth 4 --llm-txt

Fetch React Documentation

doc-fetch --url https://react.dev/learn --output ./docs/react-learn.md --concurrent 10 --llm-txt

Fetch Your Own Project Docs

doc-fetch --url https://your-project.com/docs/ --output ./internal/docs.md --llm-txt

🤖 How LLM.txt Supercharges Your AI

The generated llm.txt file acts as a semantic roadmap for your AI agents:

  1. Precise Navigation: Agents can query specific sections without scanning entire documents
  2. Context Awareness: Know whether they're looking at an API reference vs. a tutorial
  3. Efficient Retrieval: Jump directly to relevant content based on query intent
  4. Source Verification: Always maintain links back to original documentation

Example AI Prompt Enhancement:

Instead of: "What does the net/http package do?"
Your AI can now: "Check the [API] net/http section in llm.txt for HTTP client/server implementation details"

🏗️ How It Works

  1. Link Discovery: Parses the base URL to find all internal documentation links
  2. Content Fetching: Downloads all pages concurrently with respect for robots.txt
  3. HTML Cleaning: Removes non-content elements (navigation, headers, footers, etc.)
  4. Markdown Conversion: Converts cleaned HTML to structured markdown
  5. Intelligent Classification: Categorizes pages as API, GUIDE, REFERENCE, or EXAMPLE
  6. Description Generation: Creates concise, relevant descriptions for each section
  7. Single File Output: Combines all documentation into one comprehensive file
  8. LLM.txt Generation: Creates AI-friendly index with semantic categorization

🚀 Future Features

  • Incremental updates: Only fetch changed pages on subsequent runs
  • Custom selectors: Allow users to specify content areas for different sites
  • Multiple formats: Support PDF, JSON, and other output formats
  • Token counting: Estimate token usage for LLM context planning
  • Advanced classification: Machine learning-based page type detection

💡 Why This Exists

Traditional documentation sites are designed for human navigation, not AI consumption. When working with LLMs, you often need to manually copy-paste multiple sections or provide incomplete context. DocFetch automates this process, giving your AI agents complete access to documentation without the manual overhead.

Stop wasting time copying documentation. Start building AI agents with complete knowledge.

🤝 Contributing

Contributions are welcome! Please open an issue or pull request on GitHub.

📄 License

MIT License


Built with ❤️ for AI developers who deserve better documentation access