ts-afternoon

v1.0.0

Published

22 days ago

A TypeScript Node.js CLI to fetch and export structured web content.

0High
0Medium
0Low

applexie

Web Grab CLI (TypeScript)

一个基于 Node.js + TypeScript 的命令行工具：输入网页 URL，抓取页面并导出结构化结果到本地目录。

English version: README.en.md

功能

抓取网页 HTML（自动跟随重定向）
解析标题和标题层级（h1-h6）
提取页面链接列表（转换为绝对 URL）
导出三份文件：
- content.md
- meta.json
- links.json

技术栈

TypeScript
Commander（CLI 参数解析）
Cheerio（HTML 解析）
fs/promises（异步文件写入）

项目结构

src/
  types.ts
  fetcher.ts
  parser.ts
  writer.ts
  index.ts

安装与构建

npm install
npm run build

本机全局使用（推荐）

在项目根目录执行：

npm install
npm run build
npm link

然后你可以在任意目录直接运行：

web-grab https://example.com -o ./output

取消全局链接：

npm unlink -g ts-afternoon

全局安装（发布后）

如果发布到 npm，可直接全局安装：

npm install -g ts-afternoon
web-grab https://example.com -o ./output

使用方式

node dist/index.js <url> -o <outputDir>

示例：

node dist/index.js https://example.com -o output

也可以用 npm start：

npm start -- https://example.com -o output

CLI 参数

<url>：必填，目标网页地址
-o, --output <dir>：输出目录，默认值为 output

输出文件说明

`content.md`

提取页面正文相关文本（标题、段落、列表项），写入 Markdown。

`meta.json`

结构如下：

{
  "title": "页面标题",
  "url": "最终抓取 URL",
  "headings": [
    { "level": 1, "text": "H1 文本" },
    { "level": 2, "text": "H2 文本" }
  ]
}

`links.json`

结构如下：

[
  {
    "href": "https://example.com/path",
    "text": "链接文本"
  }
]

示例输出

命令：

node dist/index.js https://example.com -o output

meta.json 示例：

{
  "title": "Example Domain",
  "url": "https://example.com/",
  "headings": [
    { "level": 1, "text": "Example Domain" }
  ]
}

links.json 示例：

[
  {
    "href": "https://iana.org/domains/example",
    "text": "Learn more"
  }
]

错误处理

项目统一使用 unknown 捕获错误，并通过类型收窄提取错误信息，覆盖：

URL 非法
网络请求失败
HTML 解析失败
本地文件写入失败

开发脚本

npm run build
npm start -- <url> -o <outputDir>

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme