npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

docpup

v0.1.10

Published

CLI for indexing documentation from GitHub repositories.

Readme

docpup

CLI tool to clone GitHub documentation and source code, generating AGENTS.md indexes for AI coding agents.

What it does

Docpup fetches documentation or source code from GitHub repositories using sparse checkout, copies files to a local directory, and generates compact index files in the AGENTS.md format. These indexes provide persistent context to AI coding agents.

Supports:

  • Documentation files (.md, .mdx)
  • Source code with custom extensions (.ts, .js, .py, etc.)
  • Selective directory fetching (e.g., only src and samples)
  • Single file fetching (e.g., just README.md)
  • Fetching docs directly from URLs (live HTML pages converted to Markdown)

Paths in the config are resolved from the current working directory where you run the CLI.

For git repos, docpup keeps a root-level docpup-lock.json file. Each run resolves the current remote commit first and skips re-downloading repos whose commit, generation inputs, and outputs are unchanged.

Installation

npm install -g docpup

Or run directly with npx:

npx docpup generate

Quick Start

  1. Create a docpup.config.yaml in your project root:
docsDir: documentation
indicesDir: documentation/indices
repos:
  - name: nextjs
    repo: https://github.com/vercel/next.js
    sourcePath: docs
    ref: canary
  1. Run docpup:
docpup generate
  1. Find your docs in documentation/nextjs/, the index in documentation/indices/nextjs-index.md, and the git freshness metadata in docpup-lock.json.

Configuration

Full Configuration Example

docsDir: documentation
indicesDir: documentation/indices

gitignore:
  addDocsDir: true
  addDocsSubDirs: false
  addIndexFiles: false
  sectionHeader: "Docpup generated docs"

scan:
  includeMd: true
  includeMdx: true
  includeHiddenDirs: false
  excludeDirs:
    - .git
    - node_modules
    - images
    - assets

concurrency: 2

repos:
  # Traditional documentation indexing
  - name: nextjs
    repo: https://github.com/vercel/next.js
    sourcePath: docs
    ref: canary

  - name: auth0-docs
    repo: https://github.com/auth0/docs-v2
    sourcePath: main/docs

  # Source code indexing with multiple directories
  - name: codex-sdk
    repo: https://github.com/openai/codex
    contentType: source
    sourcePaths:
      - sdk/typescript/src
      - sdk/typescript/samples
    scan:
      extensions: [".ts", ".tsx"]

  # Single file indexing
  - name: codex-readme
    repo: https://github.com/openai/codex
    sourcePaths:
      - sdk/typescript/README.md

  # URL-based documentation fetching
  - name: claude-docs
    urls:
      - https://docs.anthropic.com/en/docs/overview
      - https://docs.anthropic.com/en/docs/quickstart
    selector: main
    password: ${DOCS_PASSWORD}

  # Sitemap-based documentation discovery
  - name: anthropic-api-docs
    sitemap: https://platform.claude.com/sitemap.xml
    paths:
      - prefix: docs/en/api
        subs:
          - sdks
    selector: main

Configuration Options

| Option | Type | Default | Description | |--------|------|---------|-------------| | docsDir | string | "documentation" | Output directory for copied docs | | indicesDir | string | "documentation/indices" | Output directory for index files | | gitignore.addDocsDir | boolean | true | Add docs directory to .gitignore | | gitignore.addDocsSubDirs | boolean | false | Add per-repo subdirectories to .gitignore (e.g., docs/nextjs/) instead of whole docs dir | | gitignore.addIndexFiles | boolean | false | Add indices directory to .gitignore | | gitignore.sectionHeader | string | "Docpup generated docs" | Header for .gitignore section | | scan.includeMd | boolean | true | Include .md files (ignored if extensions is set) | | scan.includeMdx | boolean | true | Include .mdx files (ignored if extensions is set) | | scan.includeHiddenDirs | boolean | false | Scan hidden directories (dotfolders) | | scan.excludeDirs | string[] | [...] | Directories to exclude | | scan.extensions | string[] | - | Custom file extensions to include (e.g., [".ts", ".js"]). Overrides includeMd/includeMdx | | concurrency | number | 2 | Number of repos to process in parallel |

Repo Configuration

| Option | Type | Required | Description | |--------|------|----------|-------------| | name | string | Yes | Unique identifier for this repo | | repo | string | No | GitHub repository URL. Exactly one of repo, urls, or sitemap must be provided | | urls | string[] | No | List of URLs to fetch docs from. Exactly one of repo, urls, or sitemap must be provided | | sitemap | string | No | Sitemap URL to discover doc pages. Exactly one of repo, urls, or sitemap must be provided | | paths | object[] | No | Path prefix rules for filtering sitemap URLs (see Sitemap Sources). sitemap sources only | | selector | string | No | CSS selector to extract content from HTML pages (e.g., main, article, #content). Used with urls and sitemap | | password | string | No | Password for protected doc sites (e.g., Readme.com). Supports ${ENV_VAR} interpolation. urls or sitemap sources only | | sourcePath | string | No | Single path to fetch (use . for root). Required for repo sources | | sourcePaths | string[] | No | Multiple paths to fetch (directories or single files). Required for repo sources | | ref | string | No | Branch, tag, or commit. repo sources only (auto-detects default branch if not specified) | | contentType | string | No | "docs" (default) or "source" - affects index title and warning message | | preprocess | object | No | Optional preprocess step (sphinx or html, single path only). repo sources only | | scan | object | No | Per-repo scan overrides (merged with global scan config) |

Preprocess

Note that preprocess supports Sphinx and HTML today, but is extensible to utilize any required preprocessor.

The Sphinx preprocessor uses (Sphinx) to build docs before scanning. This is useful for projects like Django that rely on reStructuredText includes, substitutions, and directives.

repos:
  - name: django-docs
    repo: https://github.com/django/django
    sourcePath: docs
    preprocess:
      type: sphinx
      workDir: docs
      builder: markdown
      outputDir: docpup-build

Prerequisites:

  • Python 3 on PATH (python)
  • Sphinx + Markdown builder: python -m pip install sphinx sphinx-markdown-builder

Notes:

  • sourcePath (or a single entry in sourcePaths) must exist in the repo (used for sparse checkout).
  • If workDir is omitted, it defaults to sourcePath or the single sourcePaths entry.
  • builder must be markdown (requires sphinx-markdown-builder).
  • outputDir must be a non-hidden directory unless scan.includeHiddenDirs is true.
  • Preprocess is not supported with multiple sourcePaths.

HTML Preprocess

The HTML preprocessor converts HTML pages into Markdown before scanning and indexing.

repos:
  - name: python-genai
    repo: https://github.com/googleapis/python-genai
    sourcePath: docs
    preprocess:
      type: html
      workDir: docs
      outputDir: docpup-build
      rewriteLinks: true

Notes:

  • workDir defaults to sourcePath.
  • rewriteLinks defaults to true, converting .html/.htm links to .md.
  • selector can be used to target a specific content node (e.g., main, article, or #content).

Source Code Indexing

Docpup can index source code in addition to documentation. Use contentType: source and custom extensions to fetch specific file types:

repos:
  - name: my-sdk
    repo: https://github.com/example/sdk
    contentType: source
    sourcePaths:
      - src
      - samples
    scan:
      extensions: [".ts", ".tsx", ".js"]
      excludeDirs: [node_modules, dist, __tests__]

This generates an index with a "Source Index" title and appropriate warning:

<!-- MY-SDK-AGENTS-MD-START -->[my-sdk Source Index]|root: documentation/my-sdk|STOP. This is source code from my-sdk. Search and read files before making changes.|src:{index.ts,client.ts}|samples:{basic.ts}<!-- MY-SDK-AGENTS-MD-END -->

Single File Fetching

You can fetch individual files by specifying file paths in sourcePaths:

repos:
  - name: project-readme
    repo: https://github.com/example/project
    sourcePaths:
      - README.md
      - docs/CONTRIBUTING.md

URL Sources

Docpup can fetch documentation directly from live HTML pages using the urls option, as an alternative to cloning a Git repository.

repos:
  - name: claude-docs
    urls:
      - https://docs.anthropic.com/en/docs/overview
      - https://docs.anthropic.com/en/docs/quickstart
    selector: main

For each URL, docpup uses a three-tier fetching strategy:

  1. Requests the URL with an Accept: text/markdown header
  2. Tries a .md URL variant (e.g., /overview/overview.md)
  3. Falls back to fetching the HTML and converting it to Markdown

Filenames are automatically derived from page titles, with common prefixes/suffixes stripped and collisions resolved by appending a numeric suffix.

Notes:

  • selector is optional. When omitted, docpup falls back through common content elements (main, article, #content, .content, body).
  • password can be used to authenticate with password-protected doc sites. Supports ${ENV_VAR} interpolation for secrets.
  • sourcePath, sourcePaths, ref, and preprocess are not valid with urls.

Sitemap Sources

Docpup can automatically discover documentation URLs from a sitemap.xml, with path prefix filtering to control which pages to include.

repos:
  - name: anthropic-api-docs
    sitemap: https://platform.claude.com/sitemap.xml
    paths:
      - prefix: docs/en/api
        subs:
          - sdks
          - skills
    selector: main

The paths array controls which URLs from the sitemap are included:

  • First-level children of each prefix are included automatically (e.g., docs/en/api/overview, docs/en/api/errors)
  • Nested paths are excluded by default (e.g., docs/en/api/sdks/python)
  • subs opts in specific sub-directories at full depth (e.g., subs: [sdks] includes docs/en/api/sdks/python, docs/en/api/sdks/typescript, etc.)
  • The prefix page itself (e.g., docs/en/api) is also included if it exists in the sitemap

When paths is omitted, all URLs from the sitemap are included without filtering.

Sitemap index files (sitemaps that reference other sitemaps) are handled automatically.

Notes:

  • sourcePath, sourcePaths, ref, and preprocess are not valid with sitemap.
  • password can be used to authenticate with password-protected doc sites.
  • paths is only valid with sitemap.

CLI Usage

# Run with default config
docpup generate

# Specify config file
docpup generate --config ./custom-config.yaml

# Process only specific repos
docpup generate --only nextjs,temporal

# Override concurrency
docpup generate --concurrency 4

# Force git repos to rebuild even if unchanged
docpup generate --refresh

# Show help
docpup --help

# Show version
docpup --version

Index File Format

Docpup generates index files in the AGENTS.md format:

Documentation Index:

<!-- NEXTJS-AGENTS-MD-START -->[nextjs Docs Index]|root: documentation/nextjs|STOP. What you remember about nextjs may be WRONG for this project. Always search docs and read before any task.|(root):{index.mdx}|guides:{setup.md,intro.md}<!-- NEXTJS-AGENTS-MD-END -->

Source Code Index:

<!-- CODEX-SDK-AGENTS-MD-START -->[codex-sdk Source Index]|root: documentation/codex-sdk|STOP. This is source code from codex-sdk. Search and read files before making changes.|sdk/typescript/src:{index.ts,client.ts}|sdk/typescript/samples:{basic.ts}<!-- CODEX-SDK-AGENTS-MD-END -->

This compact format provides:

  • Start/end markers for easy parsing
  • Root path for the files
  • Context-aware warning (docs vs source code)
  • Directory-to-file mapping with preserved path structure

Authentication

Docpup uses your existing git credentials (SSH keys, credential helpers, or stored tokens). No additional authentication configuration is required.

For private repositories, ensure you have access configured in your git environment.

Error Handling

  • If a repository fails to clone, docpup logs a warning and continues with other repos
  • If a git repo is unchanged and its outputs still exist, docpup skips downloading it and reuses the existing generated files
  • If docpup-lock.json matches but the docs or index files are missing, docpup downloads the repo again to rebuild them
  • The CLI always exits with status 0 if it can continue running (non-fatal errors)
  • Invalid configuration or unexpected errors result in non-zero exit

Requirements

  • Node.js 20 or later
  • Git 2.25 or later (for sparse-checkout support)

License

MIT