npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

site-to-md

v0.1.1

Published

Make any website AI-agent-readable. Generates /llms.txt + clean markdown for every page.

Readme

site-to-md 🤖

Make any website AI-agent-readable. Generates /llms.txt + clean markdown for every page.

robots.txt told search engines what to crawl. llms.txt tells AI agents what to read. This tool generates both automatically.

Quick Start

npx site-to-md https://mysite.com

That's it. Zero config. You'll get:

site-to-md-output/
├── llms.txt           # Index file per llmstxt.org spec
├── llms-ctx.txt       # All content inline (for single-prompt ingestion)
├── index.html.md      # Homepage as markdown
├── docs/
│   ├── getting-started.html.md
│   └── api-reference.html.md
└── blog/
    ├── hello-world.html.md
    └── release-notes.html.md

What It Does

  1. Crawls your site — follows links or uses sitemap.xml if available
  2. Extracts content — strips nav, footer, ads, scripts using Mozilla Readability (same as Firefox Reader View)
  3. Converts to markdown — clean, structured markdown via Turndown
  4. Generates /llms.txt — per the llmstxt.org spec
  5. Generates per-page .html.md files — per the spec convention
  6. Generates /llms-ctx.txt — all content inline for single-prompt ingestion

Install

# Use directly with npx (no install needed)
npx site-to-md https://mysite.com

# Or install globally
npm install -g site-to-md

# Or as a project dependency
npm install site-to-md

CLI Usage

# Crawl a live website
site-to-md https://docs.mysite.com

# Process local build output
site-to-md ./dist

# Customize output
site-to-md https://mysite.com \
  --out ./public \
  --title "My Product" \
  --desc "Developer documentation for My Product"

# Filter pages
site-to-md https://mysite.com \
  --include "/docs/**" \
  --include "/blog/**" \
  --exclude "/admin/**"

# Skip context file
site-to-md https://mysite.com --no-ctx

Options

| Flag | Description | Default | |------|-------------|---------| | --out <dir> | Output directory | ./site-to-md-output | | --title <name> | Site title for llms.txt | Auto-detected | | --desc <text> | Site description | Auto-detected | | --include <glob> | Include only matching paths (repeatable) | All | | --exclude <glob> | Exclude matching paths (repeatable) | None | | --no-ctx | Skip generating llms-ctx.txt | — | | --no-sitemap | Don't use sitemap.xml for crawling | — | | --max-depth <n> | Max crawl depth | 3 | | --concurrency <n> | Parallel requests | 5 | | --strip <selector> | CSS selectors to strip (repeatable) | — | | --config <path> | Config file path | Auto-detect |

Programmatic API

import { agentReady } from 'site-to-md';

const result = await agentReady({
  url: 'https://mysite.com',
  outDir: './public',
  title: 'My Product',
  description: 'Developer documentation',
  include: ['/docs/**'],
});

console.log(`Generated ${result.pages.length} pages`);
console.log(result.llmsTxt); // Contents of llms.txt

Config File

Create site-to-md.config.js in your project root:

export default {
  url: 'https://mysite.com',
  outDir: './public',
  title: 'My Product',
  description: 'A brief description for agents',

  include: ['/docs/**', '/blog/**'],
  exclude: ['/admin/**'],

  sections: {
    'Documentation': '/docs/**',
    'Blog': '/blog/**',
    'API Reference': '/api-docs/**',
  },

  maxDepth: 3,
  concurrency: 5,
  stripSelectors: ['.cookie-banner', '.ad-wrapper'],
};

Build Pipeline

{
  "scripts": {
    "build": "next build && site-to-md ./out --out ./out"
  }
}

Output Format

/llms.txt

Per the llmstxt.org spec:

# My Product

> Developer documentation for building with My Product

## Documentation

- [Getting Started](/docs/getting-started.html.md): Quick start guide
- [API Reference](/docs/api-reference.html.md): Complete API docs

## Blog

- [Hello World](/blog/hello-world.html.md): Our launch announcement

Per-page .html.md

Clean markdown extracted from each page — no nav, footer, ads, or scripts.

/llms-ctx.txt

All page content concatenated in a single file for one-shot ingestion by AI agents.

What is llms.txt?

llms.txt is a proposed standard (by Jeremy Howard) for making websites readable by AI agents. Think of it like robots.txt but for LLMs:

  • /llms.txt — A markdown index file listing your site's key pages with descriptions. AI agents read this first to understand what's on your site.
  • *.html.md — Clean markdown versions of each page (same URL + .md). No nav, no footer, no JavaScript — just the content.
  • /llms-ctx.txt — All content concatenated in one file for single-prompt ingestion.

Sites like Anthropic, Cloudflare, and Stripe already have /llms.txt files. site-to-md generates yours automatically.

License

MIT © Stratus Labs