npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

llms-txt-crawl

v1.0.0

Published

Probe LLMS-friendly documents and recursively crawl linked markdown/text documents.

Readme

llms-txt-crawl

抓取站点的 LLMS 文档及其继续引用的同域 .md / .txt 文档,并将成功结果保存到本地目录。

适用场景

  • 你有一个站点入口 URL,想快速判断它是否提供 llms.txt
  • 你已经拿到 llms.txt / llms-full.txt / llms-small.txt,想递归抓取关联文档
  • 你需要把抓取结果落盘,供后续索引、清洗或离线处理

运行要求

  • Node.js 20+
  • npm 11+ 或兼容版本

30 秒上手

安装依赖并构建:

npm install
npm run build

直接运行:

npm start -- "https://example.com/docs"

如果希望在本机直接使用 llms-txt-crawl 命令:

npm link
llms-txt-crawl "https://example.com/docs"

命令格式

llms-txt-crawl <url> [--output-dir DIR] [--max-retries N] [--base-delay-ms N] [--timeout-ms N]

常见用法

从普通页面开始探测并抓取:

npm start -- "https://example.com/docs"

直接从已有 LLMS 文档开始:

npm start -- "https://example.com/llms.txt"

指定输出目录:

npm start -- "https://example.com/docs" --output-dir "./saved-docs"

调整重试与超时:

npm start -- "https://example.com/docs" --max-retries 5 --base-delay-ms 500 --timeout-ms 15000

参数说明

  • --output-dir: 输出目录,默认 ./output/<host>
  • --max-retries: 对 403429503 的最大重试次数,默认 3
  • --base-delay-ms: 指数退避基础等待时间,默认 500
  • --timeout-ms: 单次请求超时时间,默认 10000

输出结果

程序会输出两类信息:

  • stderr: 抓取进度、重试、完成状态
  • stdout: 最终摘要和失败页面列表

默认输出目录示例:

输入 URL: https://example.com/docs
输出目录: ./output/example.com

落盘示例:

output/example.com/docs/llms.txt
output/example.com/docs/files/intro.md
output/example.com/files/overview.txt

摘要示例:

success=3 failed=1 output="/abs/path/to/output/example.com"
failed-pages:
https://example.com/docs/missing.md

行为边界

  • 仅支持 http / https
  • 仅递归抓取同域 .md / .txt 文档
  • 仅保存成功抓取的文档

开发

npm test