hiperf_txt_parser
v1.3.1
Published
Parse perf data.txt and output structured TypeScript data
Readme
hiperf_txt_parser
将 perf 文本中的 record sample 段解析为结构化数据,并支持导出为:
- perf 原文本格式(保留缩进和行前缀;若存在
traceFieldDict,会在 raw hex 行前输出key: value行) - JSON 数组格式(每项为
{ "call_chain": "...", ...traceFields }) .xlsx(formatPerfDataToExcel,依赖包内exceljs)
本项目当前为 纯 lib 库,
src仅保留库代码,不包含 CLI。
安装
npm install hiperf_txt_parser本地调试可直接安装本地路径:
npm install /Users/fanghaolei/Workplace/TS/snapshot_checker对外 API
import {
parsePerfData,
parsePerfDataFromFile,
formatPerfDataToText,
formatPerfDataToJson,
formatPerfDataToExcel,
importPerfDataFromExportedJson,
normalizeRecordSampleFromExportedText,
filterByTgid,
toBackTraceStack,
toBackTraceStacks,
buildTraceParserRegistry,
decodePerfRawData,
} from "hiperf_txt_parser";parsePerfData(text: string): PerfDataparsePerfDataFromFile(filePath: string): PerfData(扩展名为.json不区分大小写时按导出 JSON 解析,其它按 perf 文本解析)formatPerfDataToText(data: PerfData): stringformatPerfDataToJson(data: PerfData): RecordSampleJsonExportItem[](call_chain与formatPerfDataToExcel的callchain列同源:由callchainFrames.frames以换行拼接)formatPerfDataToExcel(data: PerfData): Promise<Buffer>(.xlsx:traceFieldDict的键集合相同的样本归到同一工作表;无字典或空字典时仅一列callchain;有字典时表头为键名排序后的列并在末尾追加callchain;recordSamples为空会抛错)importPerfDataFromExportedJson(jsonText: string): PerfDatanormalizeRecordSampleFromExportedText(sample: RecordSample): RecordSample(从导出 txt 的 rawkey: value行回填traceFieldDict)filterByTgid(data: PerfData, tgid: number): PerfData(仅保留pid === tgid的 RecordSample)toBackTraceStack(sample, options?): stringtoBackTraceStacks(data, options?): PerfDataparseTraceFormat(text: string): ParsedTraceFormat(解析sample/trace_format风格文本)parseCommonFieldsFromRaw(raw: Uint8Array, format: ParsedTraceFormat): Record<string, number | bigint>(按 format 仅解析common_*字段,小端)rawHexLinesToBuffer(lines): Uint8Array(将 perf 文本里 raw 段的 hex 行拼成字节缓冲,便于喂给parseCommonFieldsFromRaw)buildTraceParserRegistry(formatTexts)(从 trace format 文本构建 event 解析注册表,支持按 format 绑定 fieldDict 回调)decodePerfRawData(perfData, registry, options)(按注册表解码 raw,并可在fieldDict模式写回 key/value)
快速示例
import {
parsePerfData,
filterByTgid,
formatPerfDataToJson,
formatPerfDataToText,
} from "hiperf_txt_parser";
const input = `record sample: type 9, misc 2, size 520\n sample_type: 0x8000107e7\n ID 13`;
const parsed = parsePerfData(input);
const filtered = filterByTgid(parsed, 1234);
const jsonArray = formatPerfDataToJson(filtered);
const txt = formatPerfDataToText(parsed);按文件路径解析(txt/json 自动分发)
import { parsePerfDataFromFile } from "hiperf_txt_parser";
const fromTxt = parsePerfDataFromFile("./sample/perf_data.txt");
const fromJson = parsePerfDataFromFile("./out/decoded.json");fieldDict 回调示例(按 format 绑定)
import {
buildTraceParserRegistry,
decodePerfRawData,
formatPerfDataToJson,
} from "hiperf_txt_parser";
const traceFormatText = `name: foo
ID: 77
format:
field:unsigned short common_type; offset:0; size:2; signed:0;
field:unsigned char common_flags; offset:2; size:1; signed:0;
field:unsigned char common_preempt_count; offset:3; size:1; signed:0;
field:int common_pid; offset:4; size:4; signed:1;
field:__data_loc char[] path; offset:8; size:4; signed:0;
field:u32 len; offset:12; size:4; signed:0;
print fmt: "path=%s len=%u", REC->path, REC->len
`;
// 将回调与 format 直接绑定:只影响该 eventId(77)
const registry = buildTraceParserRegistry([
{
text: traceFormatText,
transformFieldDict: ({ fieldDict }) => {
// 仅在 tracePrintMode: "fieldDict" 时触发
return {
path: fieldDict.path ?? "",
len: fieldDict.len ?? "0",
};
},
},
]);
// perfData 为 parsePerfData(...) 的结果
const decoded = decodePerfRawData(perfData, registry, {
tracePrintMode: "fieldDict",
});
// JSON 导出时,traceFieldDict 会平铺到 call_chain 同级
const jsonOut = formatPerfDataToJson(decoded);Excel 导出(.xlsx)
import { parsePerfData, formatPerfDataToExcel } from "hiperf_txt_parser";
import fs from "node:fs";
const perfData = parsePerfData(perfText);
const buf = await formatPerfDataToExcel(perfData);
fs.writeFileSync("out/samples.xlsx", buf);JSON 导入说明
formatPerfDataToJson 导出的结构可直接被 importPerfDataFromExportedJson 读回:
import {
formatPerfDataToJson,
importPerfDataFromExportedJson,
} from "hiperf_txt_parser";
const arr = formatPerfDataToJson(perfData);
const restored = importPerfDataFromExportedJson(JSON.stringify(arr));Node 并行(worker_threads,经 options 开启)
以下能力 仅适用于 Node.js(依赖 worker_threads),通过原有 API 的 workerCount(默认 1) 选择串行或并行:<= 1 不创建 Worker;> 1 使用 Worker 池。
decodePerfRawData(perfData, registry, options?):options.workerCount > 1时返回Promise,并需options.formatTexts(与buildTraceParserRegistry入参相同)。含不可序列化的transformFieldDict时自动回退主线程。loadPerfData(filePath, options?):perf 文本在workerCount > 1时块级并行parseOneBlock;.json仍一次性读入。savePerfDataToText/savePerfDataToJson:options.workerCount > 1时并行序列化后按序写盘。
对比串行 wall time 与加速比:
npm run bench:parallel -- --samples=8000 --rounds=2 --worker-count=4Consumer Demo(外部 TS 项目)
提供了两个 demo:
demo/external-ts-consumer:最小调用示例demo/lib-consumer:读取sample/perf_data.txt,导出 json/txt 到out/lib-consumer/
运行文件 I/O demo:
cd demo/lib-consumer
npm install
npm run demo性能基准
提供了一个本地基准脚本,用于评估大数据量下的核心链路吞吐:
npm run bench -- --samples=20000 --rounds=5并行对比(见上一节「Node 并行 API」):
npm run bench:parallel -- --samples=8000 --rounds=2 --worker-count=4多档 samples × worker-count 扫描(输出 Markdown 表,可选 CSV):
npm run bench:scale-sweep
npm run bench:scale-sweep -- --samples=4000,15000,50000,120000 --workers=2,4,8 --rounds=3 --csv=out/bench-scale.csvspeedup = 串行 avg / 并行 avg(> 1 表示并行更快)。合成数据 raw 较小时并行可能慢于串行,属 Worker 调度与拷贝开销;请在真实大 raw 载荷与本机核数下复测。
只跑特定 profile(可逗号分隔):
npm run bench -- --samples=20000 --rounds=5 --profiles=fieldDict跳过 Core Stages(仅跑 decode/pipeline profile):
npm run bench -- --samples=20000 --rounds=5 --profiles=fieldDict --skip-coresamples:每轮构造的record sample数量(默认20000)rounds:测试轮数(默认3)profiles:可选,指定 decode profile;可选值为printf、fieldDict、fieldDict+keepCommon(默认全跑)skip-core:可选,跳过Core Stages(仅输出Decode Profiles)- 输出分为两段:
Core Stages:parsePerfData、formatPerfDataToTextDecode Profiles:printf/fieldDict/fieldDict+keepCommon三种模式下的decodePerfRawData与端到端pipeline
- 每项同时输出
avg、p50、p95、吞吐(samples/s)、heapΔ与heapPeak
输出结构说明
- 解析结构(
parsePerfData):{ recordSamples: RecordSample[] } - 类型
RecordSampleJsonExportItem:{ call_chain: string } & Record<string, string>(除call_chain外的顶层键来自traceFieldDict) - JSON 导出(
formatPerfDataToJson):
[
{
"call_chain": "frame1\\nframe2\\nframe3"
}
]