@hanfour.huang/aide
v0.1.0
Published
aide — AI Development Performance Evaluator. Analyze Claude Code & Codex usage for technical performance review.
Maintainers
Readme
aide
AI Development Performance Evaluator — A CLI tool that reads local Claude Code and Codex usage data to generate technical performance evaluation reports, similar to ccusage.
AI 開發績效評核工具 — 一個 CLI 工具,讀取本地端 Claude Code 與 Codex 的使用資料,以技術績效審核者的角色產出評核報告。
Why / 為什麼需要這個工具
Engineering managers need evidence-based data to evaluate how effectively their team uses AI coding assistants. Manual review of hundreds of AI sessions is impractical. This tool automates the process by:
研發經理需要基於證據的資料來評估團隊使用 AI 程式助手的成效。手動審查數百個 AI 工作階段不切實際。本工具透過以下方式自動化此流程:
- Extracting usage data from local Claude Code (
~/.claude/) and Codex (~/.codex/) storage - Analyzing session patterns for decision-making quality and risk identification
- Scoring against a configurable evaluation standard (default: OneAD R&D standard)
- Generating structured reports with evidence and score recommendations
Features / 功能特色
- Reads Claude Code session metadata, facets, SQLite cost data, and JSONL conversations
- Reads Codex SQLite thread data (tokens, models, sessions)
- Detects decision-making patterns (iterative refinement, multi-task coordination, active corrections)
- Detects risk identification signals (security awareness, performance discussions, bug catching)
- Configurable evaluation standard — bring your own criteria, keywords, and thresholds
- Multiple output formats: terminal (colored), JSON, Markdown, HTML
- JSON output is machine-parseable (
--format jsonemits clean JSON to stdout, progress logs go to stderr) - Noise filtering to exclude system messages and code review templates from analysis
init-standardcommand to export the default standard as a customization template- Data quality warnings when data sources are missing or incomplete
Data Sources / 資料來源
| Source | Path | Data |
|--------|------|------|
| Claude Code Session Meta | ~/.claude/usage-data/session-meta/*.json | Tokens, duration, tools, languages, git commits, first prompt |
| Claude Code Facets | ~/.claude/usage-data/facets/*.json | AI-generated session analysis: goals, outcomes, friction, helpfulness |
| Claude Code SQLite | ~/.claude/__store.db | Per-message cost (USD), model, duration |
| Claude Code JSONL | ~/.claude/projects/*/*.jsonl | Full conversation content for keyword signal scanning |
| Codex SQLite | ~/.codex/state_5.sqlite | Threads: tokens_used, model, title, git info |
| Codex History | ~/.codex/history.jsonl | Full user prompts by thread/session |
| Codex Logs | ~/.codex/logs_2.sqlite | Thread-level tool calls and error events |
All data is read locally and read-only. No data is sent to any external service.
所有資料皆為本地端唯讀存取,不會傳送至任何外部服務。
Prerequisites / 系統需求
- Node.js >= 18
- npm (included with Node.js)
~/.claude/directory (from Claude Code usage)~/.codex/directory (from Codex CLI usage, optional)
Installation / 安裝
Recommended: Install from npm / 建議:從 npm 安裝
npm install -g @hanfour.huang/aide
# Verify installation
aide --versionUpdate / 更新
npm install -g @hanfour.huang/aide@latestExisting local-clone users / 已使用 clone 安裝的使用者
If you previously installed from a cloned repo or npm link, migrate to the npm package:
npm unlink -g aide
npm install -g @hanfour.huang/aide@latestDevelopment mode / 開發模式
git clone https://github.com/hanfour/aide.git ~/aide
cd ~/aide
npm install
npx tsx src/cli.ts --helpQuick Start / 快速開始
# Quick usage summary (last 7 days)
aide summary
# Full evaluation report (last 30 days, terminal output)
aide report
# Save report as Markdown
aide report --format markdown --output report.md
# Save report as HTML
aide report --format html --output report.html
# Monthly KPI report
aide monthlyUsage / 使用方式
Quick Summary / 快速摘要
# Last 7 days (default)
aide summary
# Custom date range
aide summary --since 2026-03-01 --until 2026-03-31Output:
AI Dev Usage Summary
Period: 2026-03-01 ~ 2026-03-31
Claude Code
Sessions: 57
Tokens: 259,336
Duration: 15676 min
Active Days: 9
Codex
Sessions: 1
Tokens: 368,930
Active Days: 1Full Evaluation Report / 完整評核報告
# Default: last 30 days, text format, built-in OneAD standard
aide report
# Current calendar month
aide monthly
# Previous full calendar month
aide monthly --previous
# Current calendar quarter
aide quarterly
# Previous full calendar quarter
aide quarterly --previous
# Custom date range
aide report --since 2026-03-01 --until 2026-04-14
# Output as Markdown file
aide report --format markdown --output report.md
# Output as HTML file
aide report --format html --output report.html
# Output as JSON (machine-parseable, clean stdout)
aide report --format json --output report.json
# Pipe JSON for programmatic consumption
aide report --format json 2>/dev/null | jq '.sections[].score'
# Use a custom evaluation standard
aide report --standard my-standard.json
# Include engineer/department metadata in report
aide report --engineer "Jane Doe" --department "R&D"Note: When using
--format json, progress and status messages are written to stderr. stdout contains only the JSON report, making it safe to pipe tojqor other tools.
Using the compiled CLI / 使用編譯後的 CLI
If you are developing locally and have run npm run build, you can use node dist/cli.js:
node dist/cli.js report --since 2026-03-01 --until 2026-03-31
node dist/cli.js summary
node dist/cli.js monthly --previous --format markdown --output march.md
node dist/cli.js report --format html --output report.htmlCLI Reference / 命令參考
aide report
Generate a full evaluation report.
Options:
-s, --since <date> Start date, YYYY-MM-DD (default: 30 days ago)
-u, --until <date> End date, YYYY-MM-DD (default: today)
-f, --format <format> Output: text | json | markdown (default: text)
-o, --output <file> Write report to file instead of stdout
--standard <path> Path to custom evaluation standard JSON
--engineer <name> Engineer name for report identification
--department <name> Department name for report identificationaide summary
Quick usage summary for a date range.
Options:
-s, --since <date> Start date, YYYY-MM-DD (default: 7 days ago)
-u, --until <date> End date, YYYY-MM-DD (default: today)aide monthly
Generate a monthly KPI report.
Options:
-f, --format <format> Output: text | json | markdown (default: text)
-o, --output <file> Write report to file instead of stdout
--standard <path> Path to custom evaluation standard JSON
--previous Use the previous full calendar monthaide quarterly
Generate a quarterly KPI report.
Options:
-f, --format <format> Output: text | json | markdown (default: text)
-o, --output <file> Write report to file instead of stdout
--standard <path> Path to custom evaluation standard JSON
--previous Use the previous full calendar quarteraide init-standard
Export the default evaluation standard as a JSON template for customization.
Options:
-o, --output <file> Output file path (default: eval-standard.json)Report Structure / 報告結構
The generated report contains the following sections:
1. Management Summary / 管理摘要
Management-facing overview for monthly/quarterly KPI review:
- Overall headline
- Period assessment
- Key observations
- Recommended follow-up actions
2. Usage Overview / 使用概覽
Quantitative metrics for both Claude Code and Codex:
- Total sessions, tokens (input/output), estimated cost
- Active days, duration
- Top projects by token usage
- Top tools used (Bash, Read, Edit, etc.)
- Model breakdown
3-N. Evaluation Sections / 評核區段
Each section defined in the evaluation standard generates:
- Summary — aggregate statistics
- Usage evidence — workload/depth indicators such as sessions, tool usage, follow-up prompts
- Score evidence — threshold-relevant evidence used for 100% / 120% scoring
- Evidence signals — grouped by type (iterative refinement, bugs caught, security awareness, etc.)
- Metrics — numeric indicators used for scoring
Final. Score Recommendation / 分值建議
For each evaluation section:
- Score: Standard (100%) or Superior (120%)
- Label: Human-readable grade
- Reason: Evidence-backed explanation referencing the criteria
Data Quality Warnings / 資料品質警告
The report includes data quality warnings when:
- Required data sources (
~/.claude/usage-data/session-meta) are missing - Sessions exist but no facets are found (qualitative analysis limited)
- No keyword signals detected (JSONL files may be missing)
- No sessions found at all in the evaluation period
Custom Evaluation Standards / 自訂評核標準
The built-in default is the OneAD R&D AI-Application Evaluation Standard. To create your own:
Step 1: Export the default template / 匯出預設範本
npx tsx src/cli.ts init-standard --output my-standard.jsonStep 2: Edit the JSON file / 編輯 JSON 檔案
Key fields you can customize:
| Field | Purpose |
|-------|---------|
| name | Standard name shown in report header |
| sections[] | Array of evaluation sections (add/remove/reorder) |
| sections[].id | Unique section identifier |
| sections[].name | Section display name |
| sections[].weight | KPI weight (display only) |
| sections[].keywords | Conversation scanning keywords |
| sections[].thresholds | Numeric thresholds for Superior score |
| sections[].superiorRules | Optional rule for combining thresholds |
| sections[].standard | 100% score criteria text |
| sections[].superior | 120% score criteria text |
| noiseFilters | Rules to exclude system/template messages |
Step 3: Use it / 使用自訂標準
npx tsx src/cli.ts report --standard my-standard.jsonExample: Adding a new section / 新增評核區段範例
{
"id": "collaboration",
"name": "AI-Human Collaboration Quality",
"weight": "30%",
"standard": {
"score": 100,
"label": "Standard",
"criteria": ["Uses AI for routine tasks", "Follows AI suggestions without modification"]
},
"superior": {
"score": 120,
"label": "Superior",
"criteria": ["Actively debates with AI on design decisions", "Synthesizes multiple AI suggestions into novel solutions"]
},
"keywords": ["design", "architecture", "trade-off", "pattern", "alternative"],
"thresholds": {
"iterativeRatio": 0.4,
"keywordHits": 15
},
"superiorRules": {
"mode": "grouped",
"strongThresholds": ["iterativeRatio", "keywordHits"],
"supportThresholds": ["avgToolUses"],
"minStrongMatched": 1,
"minSupportMatched": 0
}
}Superior Rules / 升等規則
superiorRules.mode = "any" — any matched threshold is enough for 120%.
superiorRules.mode = "grouped" — separate strong evidence from support evidence. Strong evidence must meet a minimum count; support evidence alone is not sufficient.
Keys referenced by strongThresholds and supportThresholds must also exist in thresholds.
Available threshold keys / 可用門檻鍵值
| Key | Description |
|-----|-------------|
| iterativeRatio | Ratio of iterative/multi-task sessions to total |
| correctionCount | Number of user corrections/interruptions |
| keywordHits | Number of keyword signal matches |
| avgToolUses | Average tool uses per session |
| securityCount | Security-related keyword matches |
| performanceCount | Performance-related keyword matches |
| bugsCaught | AI-generated bugs caught (from facets) |
| frictionSessions | Sessions with friction events |
| codexIterativeSessions | Codex threads with strong iterative evidence |
| codexMultiTurnSessions | Codex multi-turn threads |
| codexFollowUpCount | Codex follow-up user prompts |
| codexDeepSessions | Codex high-depth threads |
| codexErrorSessions | Codex threads with logged errors |
Default Evaluation Standard / 預設評核標準
The built-in OneAD standard evaluates two dimensions:
AI Interaction & Decision (20% KPI weight) / AI 交互與決策
| Grade | Criteria | |-------|----------| | Standard (100%) | Actively use AI for coding; clear decision notes | | Superior (120%) | Multi-iteration guidance (A->B->C); system-constraint-aware optimization |
AI Identification & Risk Control (50% KPI weight) / AI 識別與風險控管
| Grade | Criteria | |-------|----------| | Standard (100%) | Catch common AI errors/hallucinations; stable code | | Superior (120%) | Identify critical risks (security, performance, memory); produce SOP/Wiki for team sharing |
Architecture / 架構
src/
├── cli.ts # CLI entry point (commander)
├── types.ts # TypeScript type definitions
├── standard.ts # Load & validate evaluation standards
├── period.ts # Date period resolution (monthly/quarterly)
├── data-quality.ts # Data source completeness checks
├── utils.ts # Shared utilities (noise filter)
├── extractors/
│ ├── claude-code.ts # Read ~/.claude/ data (JSONL, SQLite, JSON)
│ └── codex.ts # Read ~/.codex/ data (SQLite, JSONL)
├── analyzers/
│ ├── usage.ts # Aggregate quantitative usage metrics
│ └── section.ts # Generic section analyzer (facets + keywords + thresholds)
└── reporters/
└── report.ts # Render reports (text, JSON, Markdown)
templates/
└── eval-standard.json # Default OneAD evaluation standard (source of truth)
tests/
├── cli.test.ts # CLI regression tests (subprocess)
├── section.test.ts # Section analyzer unit tests
├── standard.test.ts # Standard loader/validator tests
├── data-quality.test.ts # Data quality checker tests
└── fixtures/ # Test fixture filesPipeline / 處理流程
Extract --> Analyze --> Score --> Report
1. Extract: Read session-meta, facets, SQLite, JSONL from local stores
2. Analyze: Aggregate usage + run each section through generic analyzer
3. Score: Compare metrics against section thresholds
4. Report: Render in chosen format with evidence and recommendationsDevelopment / 開發
Scripts / 腳本
npm run build # Compile TypeScript to dist/
npm run dev # Run CLI directly via tsx (no build needed)
npm run test # Run test suite (vitest)
npm run test:watch # Run tests in watch modeRunning tests / 執行測試
# Run all tests
npm test
# Run a specific test file
npx vitest run tests/section.test.ts
# Watch mode
npm run test:watchProject conventions / 專案慣例
- All progress/status messages are written to stderr; report output goes to stdout
- JSON output (
--format json) is guaranteed clean on stdout for piping - SQLite connections are wrapped in
try/finallyto prevent resource leaks - The evaluation standard template (
templates/eval-standard.json) is the single source of truth - Custom standards inherit default
noiseFiltersif not specified
Troubleshooting / 問題排除
No sessions found
- Verify
~/.claude/usage-data/session-meta/contains JSON files - Check the date range matches when the AI tools were used
- For Codex, verify
~/.codex/state_5.sqliteexists
Empty facets
- Facets are generated asynchronously by Claude Code after sessions end
- Recent sessions may not have facets yet
- The tool will show a data quality warning in this case
JSON output contains extra text
This was fixed in v0.1.0. All progress messages now go to stderr. If you encounter this, ensure you are using the latest version. Use 2>/dev/null to suppress stderr when piping:
npx tsx src/cli.ts report --format json 2>/dev/null | jq .License
MIT
