curl-fetcher

v0.0.3

Published

3 days ago

根据 cURL 命令构建受限并发请求，按响应 JSON 路径组织输出文件的工具

0High
0Medium
0Low

huaguang-npm

curl-fetcher

根据 cURL 命令构建受限并发请求，按响应 JSON 路径组织输出文件的工具。

特性

🔄 从浏览器或 API 调试工具导出 cURL 命令，自动转换为可执行的请求
⚡️ 使用 p-limit 控制并发数，避免过载
📁 根据响应 JSON 路径自动组织输出文件
🔐 自动刷新授权（支持自定义逻辑）
📊 进度跟踪，支持断点续传
🌐 兼容 Node.js 和 Bun

安装

bun install curl-fetcher
# 或
npm install curl-fetcher

快速开始

1. 准备 cURL 命令文件

从浏览器或 API 调试工具（如 Reqable）复制 cURL 命令，保存到 request.bash：

curl -X GET 'https://example.com/api/registry/item-name' \
  -H 'User-Agent: node-fetch' \
  -H 'Connection: close' \
  -H 'x-license-key: YOUR_LICENSE_KEY_HERE'

2. 准备 URL 映射列表

创建 names.json 文件，包含需要请求的所有 URL 路径最后一段：

[
  "cta-section-1",
  "cta-section-2",
  "bento-grid-1",
  "bento-grid-2",
  "bento-grid-3"
]

3. 运行示例脚本

项目根目录提供了 example.ts 示例脚本，可以直接运行：

bun run example.ts
# 或
bun example.ts

示例脚本内容：

import { curlFetcher } from "./src/index.js";
import names from "./names.json" assert { type: "json" };

await curlFetcher({
  curlPath: "./request.bash",
  outputDir: "./output",
  concurrency: 10,
  limit: 20, // 测试模式：只处理前 20 个
  urlMapper: names,
  outputPathExtractor: "files[0].path",
  contentExtractor: "files[0].content",
});

4. 完整运行

当测试通过后，设置 limit: 0 进行完整运行：

await curlFetcher({
  curlPath: "./request.bash",
  outputDir: "./output",
  urlMapper: names,
  outputPathExtractor: "files[0].path",
  contentExtractor: "files[0].content",
  limit: 0, // 完整运行，自动跳过已完成的项
});

API 文档

`curlFetcher(options: CurlFetcherOptions): Promise<void>`

主函数，执行并发请求并组织输出文件。

配置选项

必需参数

urlMapper (string[] | (originalUrl: string, item: string) => string)

URL 映射器。可以是字符串数组（用于替换 URL 路径的最后一段），或自定义转换函数。

// 数组形式：自动替换 URL 最后一段
urlMapper: ["item-1", "item-2", "item-3"];

// 函数形式：自定义 URL 转换逻辑
urlMapper: (originalUrl, item) => {
  return originalUrl.replace(/\/[^/]+$/, `/${item}`);
};

contentExtractor (string)
从响应 JSON 中提取内容的路径。使用点号和方括号访问嵌套属性，如 files[0].content。

可选参数

curlPath (string, 默认: "./request.bash")
cURL 命令文件路径（绝对路径或相对路径）。
outputDir (string, 默认: "./output")
输出文件目录。
concurrency (number, 默认: 10)
并发请求数。
limit (number, 默认: 20)
限制处理数量。20 为测试模式（只处理前 20 个），0 为完整运行（处理全部，自动跳过已完成项）。
outputPathExtractor (string | (data: unknown, item: string) => string, 可选)
输出文件路径提取器。可以是 JSON 路径字符串（如 files[0].path），或自定义函数。
如果未提供，将使用 urlMapper 中的项作为文件名。
```
// JSON 路径形式
outputPathExtractor: "files[0].path";

// 自定义函数形式
outputPathExtractor: (data, item) => {
  const path = getValueByPath(data, "files[0].path");
  return `custom/${path}`;
};
```
progressFile (string, 可选)
进度记录文件路径。默认保存在 {outputDir}/.progress.json。

authRefresh ("skip" | "first" | () => Promise<string[]>, 默认: "skip")

授权刷新模式：

"skip": 跳过授权刷新（默认）
"first": 使用第一个请求刷新授权
自定义函数: 返回 Set-Cookie 响应头数组

// 跳过授权刷新（默认）
authRefresh: "skip";

// 使用第一个请求刷新授权
authRefresh: "first";

// 自定义授权刷新函数
authRefresh: async () => {
  const response = await fetch("https://api.example.com/auth");
  return response.headers.getSetCookie();
};

firstItem (string, 可选)
用于授权刷新的第一个请求项（仅在 authRefresh: "first" 时使用）。默认使用 urlMapper[0]。

skipErrors (boolean, 默认: false)

是否跳过失败的项。默认为 false，即失败的项会在下次运行时重试。设置为 true 时，失败的项也会被跳过，不会重试。

// 默认：失败的项会重试
await curlFetcher({
  urlMapper: names,
  contentExtractor: "files[0].content",
  // skipErrors: false (默认)
});

// 跳过失败的项，不重试
await curlFetcher({
  urlMapper: names,
  contentExtractor: "files[0].content",
  skipErrors: true,
});

工作原理

解析 cURL：读取 cURL 命令文件，使用 curlconverter 转换为可执行的请求配置。
刷新授权（可选）：根据 authRefresh 配置决定是否刷新授权。支持跳过（默认）、使用第一个请求刷新、或自定义函数刷新。
构建请求列表：根据 urlMapper 生成所有需要请求的 URL。
并发执行：使用 p-limit 控制并发数，发送请求。
提取和组织：
- 使用 contentExtractor 从响应 JSON 中提取内容
- 使用 outputPathExtractor 确定输出文件路径
- 写入文件到指定目录
进度跟踪：
- 记录已完成的请求，支持断点续传（limit: 0 时自动跳过已完成项）
- 默认只跳过成功的项，失败的项会重试
- 如果失败的项重试成功，会自动从错误列表移除并添加到成功列表
- 进度文件格式：{ success: string[], error: string[], successCount: number, errorCount: number, total: number }

示例场景

场景 1：批量下载组件代码

import { curlFetcher } from "curl-fetcher";
import componentNames from "./components.json" assert { type: "json" };

await curlFetcher({
  curlPath: "./request.bash",
  outputDir: "./components",
  urlMapper: componentNames,
  outputPathExtractor: "files[0].path", // 使用响应中的路径
  contentExtractor: "files[0].content", // 提取代码内容
  concurrency: 5, // 降低并发避免限流
  limit: 0, // 完整运行
  authRefresh: "first", // 使用第一个请求刷新授权
});

场景 2：自定义输出路径

await curlFetcher({
  curlPath: "./request.bash",
  outputDir: "./output",
  urlMapper: names,
  outputPathExtractor: (data, item) => {
    // 自定义路径组织逻辑
    const category = item.split("-")[0]; // 例如 "bento", "cta"
    const path = getValueByPath(data, "files[0].path");
    return `${category}/${path}`;
  },
  contentExtractor: "files[0].content",
});

场景 3：授权刷新配置

// 跳过授权刷新（默认）
await curlFetcher({
  curlPath: "./request.bash",
  urlMapper: names,
  contentExtractor: "files[0].content",
  authRefresh: "skip", // 或省略此选项
});

// 使用第一个请求刷新授权
await curlFetcher({
  curlPath: "./request.bash",
  urlMapper: names,
  contentExtractor: "files[0].content",
  authRefresh: "first",
});

// 自定义授权刷新函数
await curlFetcher({
  curlPath: "./request.bash",
  urlMapper: names,
  contentExtractor: "files[0].content",
  authRefresh: async () => {
    // 自定义授权刷新逻辑
    const response = await fetch("https://api.example.com/auth");
    return response.headers.getSetCookie();
  },
});

注意事项

首次运行建议使用 limit: 20 进行测试
完整运行时设置 limit: 0，会自动跳过已完成的项
确保 cURL 文件中的 URL 路径最后一段可以被 urlMapper 中的项替换
响应必须是 JSON 格式
进度文件保存在 {outputDir}/.progress.json，每次运行会覆盖（但 limit: 0 时会读取并跳过已完成项）

许可证

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

curl-fetcher

特性

安装

快速开始

1. 准备 cURL 命令文件

2. 准备 URL 映射列表

3. 运行示例脚本

4. 完整运行

API 文档

curlFetcher(options: CurlFetcherOptions): Promise<void>

配置选项

必需参数

可选参数

工作原理

示例场景

场景 1：批量下载组件代码

场景 2：自定义输出路径

场景 3：授权刷新配置

注意事项

许可证

`curlFetcher(options: CurlFetcherOptions): Promise<void>`