npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2025 – Pkg Stats / Ryan Hefner

slurp-ai

v1.0.3

Published

A CLI tool for scraping and compiling documentation or other multi page content from websites and NPM packages into a single markdown file.

Downloads

848

Readme

SlurpAI

SlurpAI is a CLI tool for scraping and compiling documentation from websites and NPM packages into markdown format. It's designed to be used as a starting point for AI agents to consume documentation via MCP (Model Context Protocol).

Features

  • Direct URL Scraping: Fetches content directly from a starting URL.
  • NPM Package Documentation: Retrieves documentation for specific NPM packages and versions.
  • Markdown Conversion: Transforms HTML documentation into clean, structured markdown using Turndown.
  • Content Cleanup: Removes common navigation elements and other non-content sections.
  • Compilation: Combines content from scraped pages into a single output file.
  • Configurable: Options can be set via the config.js file.
  • Asynchronous: Uses async/await for better performance and scalability.

Prerequisites

  • Node.js v20 or later

Installation

# Install globally from npm
npm install -g slurpai

Or locally in your project:

npm install slurpai

Usage

Scraping from a URL

# Scrape and compile documentation from a URL in one step
slurp https://expressjs.com/en/4.18/

# With base path option (for filtering links)
slurp https://example.com/docs/introduction --base-path https://example.com/docs/

How It Works

When you run the URL scraping command, SlurpAI will:

  1. Start scraping from the provided URL (e.g., https://example.com/docs/v1/).
  2. Follow internal links found on the pages. If SLURP_ENFORCE_BASE_PATH is set to true (the default is false), it will only follow links that start with the specified --base-path (or the starting URL if --base-path is omitted).
  3. Convert the HTML content of scraped pages to Markdown, removing common navigation elements and other non-content sections.
  4. Save intermediate Markdown files to a temporary directory (default: slurp_partials/).
  5. Compile these partial files into a single Markdown file in the output directory (default: slurps/). The filename will be based on the domain name (e.g., example_docs.md).

Configuration (Optional)

You can customize SlurpAI's behavior by modifying the config.js file in the project root. Configuration is organized into logical sections:

File System Paths

| Property | Default | Description | | ----------- | ----------------- | ------------------------------------------------- | | inputDir | slurps_partials | Directory for intermediate scraped markdown files | | outputDir | slurps | Directory for the final compiled markdown file | | basePath | | Base path used for link filtering (if specified) |

Web Scraping Settings

| Property | Default | Description | | ----------------- | ------- | -------------------------------------------------- | | maxPagesPerSite | 100 | Maximum pages to scrape per site (0 for unlimited) | | concurrency | 25 | Number of pages to process concurrently | | retryCount | 3 | Number of times to retry failed requests | | retryDelay | 1000 | Delay between retries in milliseconds | | useHeadless | false | Whether to use a headless browser for JS-rendering | | timeout | 60000 | Request timeout in milliseconds |

URL Filtering

| Property | Default | Description | | --------------------- | ------------------------------ | ------------------------------------------------------ | | enforceBasePath | true | Only follow links starting with the effective basePath | | preserveQueryParams | ['version', 'lang', 'theme'] | Query parameters to preserve when normalizing URLs |

Markdown Compilation

| Property | Default | Description | | --------------------- | ------- | ----------------------------------------------------- | | preserveMetadata | true | Preserve metadata blocks in markdown | | removeNavigation | true | Remove navigation elements from content | | removeDuplicates | true | Attempt to remove duplicate content sections | | similarityThreshold | 0.9 | Threshold for considering content sections duplicates |

Base Path Explanation

The main URL argument provided to slurp is the starting point for scraping. The optional --base-path flag defines a URL prefix used for filtering which discovered links are added to the scrape queue.

  • If --base-path is not provided, it defaults to the starting URL, and Slurp will grab all pages that include the starting url in their url string. Sometimes you may want to use different starting pages and base path.

Example: To scrape only the /docs/ section of a site BUT starting from the introduction page:

# Modify config.js to set enforceBasePath to true
# In config.js:
# urlFiltering: {
#   enforceBasePath: true,
#   ...
# }

# Then run the command
slurp https://example.com/docs/introduction --base-path https://example.com/docs/

In this case, links like https://example.com/docs/advanced would be followed, but https://example.com/blog/post would be ignored. This is often used if the base path itself returns a 404 when you try and load it.

MCP Server Integration

SlurpAI MCP is in testing, it's included in this release.

Contributing

Issues and pull requests are welcome! Please feel free to contribute to this project:

License

ISC