crawler-scanner-sdk

v2.1.0

Published

a month ago

A Node.js SDK for integrating crawlergo, xray, and req probe tools

0High
0Medium
0Low

xyq777

crawlergo xray security scanner web-crawler

Crawler Scanner SDK

一个易用、易修改的 Node.js SDK，用于操作 bin 目录下的安全扫描工具。

特性

🎯 简单易用：清晰的 API 设计，易于理解和使用
🔧 易于修改：代码结构清晰，每个工具独立封装
📦 完整封装：封装所有 bin 工具（crawlergo, clean, probe, getSample）
📊 类型定义：完整的 TypeScript 类型定义

安装

npm install crawler-scanner-sdk

快速开始

方式一：使用 SecurityScanner（推荐）

const { SecurityScanner } = require("crawler-scanner-sdk");

// 创建扫描器实例
const scanner = new SecurityScanner({
  binPath: "/path/to/bin", // bin 目录路径（必需）
  chromePath: "/path/to/chrome", // Chrome 路径（可选）
  xrayPath: "/path/to/xray", // xray 可执行文件路径（可选，用于漏洞扫描）
  storageType: "file", // 存储类型: 'file'
  storagePath: "./results", // 存储路径（可选，默认值：'./scanner-results'）
});

// 1. 运行爬虫
const crawlResult = await scanner.runCrawlergo("https://example.com", {
  maxCrawlCount: 100,
});

// 2. 运行探测（重放模式）
const probeResult = await scanner.runProbe(crawlResult.outputFile, {
  enableReplay: true,
  timeout: 5.0,
  concurrency: 10,
});

// 3. 清理结果
const cleanResult = await scanner.runClean(probeResult.outputFile, {
  keywords: ["error", "not found"],
});

// 4. 提取样本
const sampleResult = await scanner.runGetSample(probeResult.outputFile);
console.log("Samples:", sampleResult.result.resp_data_sample);

// 5. 运行 xray 扫描（需要先配置 xrayPath）
const xrayResult = await scanner.runXray(["http://example.com/api/users"], {
  outputFile: "/path/to/xray.json",
  plugins: ["xss", "sqldet", "cmd-injection"],
});
console.log("Found vulnerabilities:", xrayResult.result.length);

方式二：直接使用工具类

const {
  CrawlergoTool,
  ProbeTool,
  CleanTool,
  GetSampleTool,
} = require("crawler-scanner-sdk");

// 直接调用工具
const crawlResult = await CrawlergoTool.run("https://example.com", {
  crawlergoPath: "/path/to/bin/crawlergo",
  outputFile: "./crawler.json",
});

const probeResult = await ProbeTool.run("./crawler.json", {
  probePath: "/path/to/bin/probe",
  enableReplay: true,
});

数据模型

爬虫输出模型（CrawlergoOutput）

interface CrawlergoOutput {
  req_list: CrawlergoRequest[];
  sub_domain_list: string[];
}

interface CrawlergoRequest {
  url: string;
  method: string;
  req_headers: Record<string, any> | null;
  req_data: string;
  has_param: boolean;
  source: string;
  is_relative: boolean;
}

接口预测模型（ProbeResult）

interface ProbeResult {
  url: string;
  method: string;
  req_headers: Record<string, any> | null;
  req_data: string;
  status_code: number;
  resp_headers: Record<string, string>;
  resp_data: string;
  error?: string;
}

响应数据摘要模型（RespDataSample）

interface RespDataSample {
  resp_data_sample: string[];
}

API 文档

SecurityScanner

`runCrawlergo(targetUrls, options?)`

运行 crawlergo 爬虫。

参数：

targetUrls: 目标 URL（字符串或数组）
options: 选项对象
- maxTabCount: 最大标签页数量
- maxCrawlCount: 最大爬取数量
- maxRuntime: 最大运行时间（秒）
- filterMode: 过滤模式 ('simple' | 'smart')
- proxy: 代理地址
- outputFile: 输出文件路径
- taskId: 自定义任务 ID

返回：

{
  taskId: string;
  result: CrawlergoOutput;
  outputFile: string;
}

`runProbe(crawlergoFile, options?)`

运行 probe 探测。

参数：

crawlergoFile: crawlergo 输出文件路径
options: 选项对象
- enableProbe: 启用探测模式（relative 请求）
- enableReplay: 启用重放模式（has_param、非 GET、XHR 请求）
- apiBase: API 基础 URL
- timeout: 超时时间（秒）
- concurrency: 并发数
- outputFile: 输出文件路径

返回：

{
  taskId: string;
  results: ProbeResult[];
  outputFile: string;
}

`runClean(inputFile, options?)`

运行 clean 清理。

参数：

inputFile: 输入文件路径（probe 输出的 JSONL）
options: 选项对象
- keywords: 关键字列表（字符串逗号分隔或字符串数组），用于过滤无效响应
- outputFile: 输出文件路径

返回：

{
  results: ProbeResult[];
  outputFile: string;
}

说明：

如果不提供 keywords，执行大规模清理
如果提供 keywords，使用关键字进行精确过滤（第二次清理）
关键字会传递给 clean 工具的 -k 参数

`runGetSample(probeFile, options?)`

运行 getSample 提取样本。

参数：

probeFile: probe 输出文件路径（JSONL）
options: 选项对象
- outputFile: 输出文件路径

返回：

{
  result: RespDataSample; // { resp_data_sample: string[] }
  outputFile: string;
}

说明：

getSample 从 probe 输出中随机抽样响应体样本
返回的 resp_data_sample 是字符串数组，每个字符串是 JSON 格式的响应体
这些样本可以用于 AI 提取无效响应指纹

`runXray(urls, options)`

运行 xray 扫描。

参数：

urls: URL 数组（必需）
options: 选项对象（必需）
- outputFile: 输出文件路径（必需）
- plugins: 插件列表（可选），如 ['xss', 'sqldet', 'cmd-injection']

返回：

{
  taskId: string;
  result: any[];  // 漏洞数组
}

说明：

xray 输出是 JSON 格式（数组），SDK 会自动读取并返回解析后的数组
如果 xray 没有发现漏洞或执行失败，返回空数组
会创建临时的 URL 列表文件传递给 xray

示例：

const xrayResult = await scanner.runXray(
  ["http://example.com/api/users", "http://example.com/api/posts"],
  {
    outputFile: "/path/to/xray.json",
    plugins: ["xss", "sqldet", "cmd-injection"],
  }
);
console.log("Found vulnerabilities:", xrayResult.result.length);

代码结构

src/
├── index.js              # 主入口，导出所有类和工具
├── scanner.js            # SecurityScanner 主类
├── models.js             # 数据模型定义
├── tools/                # 工具封装类
│   ├── crawlergo.js      # Crawlergo 工具
│   ├── probe.js          # Probe 工具
│   ├── clean.js          # Clean 工具
│   └── getSample.js      # GetSample 工具
└── storage/              # 存储适配器
    └── file.js          # 文件存储

修改指南

添加新工具

在 src/tools/ 目录下创建新的工具类文件
实现 run() 静态方法
在 src/index.js 中导出新工具类

修改数据模型

更新 src/models.js 中的模型定义
更新 src/index.d.ts 中的 TypeScript 类型定义
更新相关工具类的数据处理逻辑

修改工具调用逻辑

每个工具类都是独立的，可以直接修改对应的 src/tools/*.js 文件。

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

Crawler Scanner SDK

特性

安装

快速开始

方式一：使用 SecurityScanner（推荐）

方式二：直接使用工具类

数据模型

爬虫输出模型（CrawlergoOutput）

接口预测模型（ProbeResult）

响应数据摘要模型（RespDataSample）

API 文档

SecurityScanner

runCrawlergo(targetUrls, options?)

runProbe(crawlergoFile, options?)

runClean(inputFile, options?)

runGetSample(probeFile, options?)

runXray(urls, options)

代码结构

修改指南

添加新工具

修改数据模型

修改工具调用逻辑

`runCrawlergo(targetUrls, options?)`

`runProbe(crawlergoFile, options?)`

`runClean(inputFile, options?)`

`runGetSample(probeFile, options?)`

`runXray(urls, options)`