ci-triage
v0.3.3
Published
Open-source CI failure triage for humans and agents — smart log parsing, flake detection, structured JSON, MCP server.
Maintainers
Readme
ci-triage
Open-source CI failure triage for humans and agents. Parses CI logs, classifies failures, detects flaky tests, and outputs structured JSON — with an MCP server for coding agents.
v0.2 adds LLM root-cause analysis, a persistent SQLite flake database, and multi-CI support (GitHub, GitLab, CircleCI).
Why
When CI fails, everyone does the same dance: open the run, scroll logs, squint at errors. Coding agents have it worse — they can't scroll. ci-triage gives both humans and agents structured, queryable failure data.
Features
- 🔍 Smart log parsing — extracts errors, file:line references, stack traces
- 🏷️ Failure classification — 16 categories with severity levels and confidence scores
- 🧪 Flake detection — in-memory (single run) + SQLite persistent history across runs
- 🤖 LLM root-cause analysis — OpenAI Responses API, gated on
OPENAI_API_KEY, graceful fallback - 🌐 Multi-CI — GitHub Actions, GitLab CI, CircleCI, with auto-detection
- 📋 JUnit XML support — parses standard test result files
- 📊 Structured JSON output — agent-consumable schema with full failure context
- 🔧 MCP server — coding agents (Codex, Claude Code) can query failures programmatically
- ⚙️ GitHub Action — drop into any workflow with PR comments and artifacts
Quick Start
CLI
# Install globally
npm install -g ci-triage
# Or use without installing
npx ci-triage owner/repoBasic usage
# Triage the most recent failed run
ci-triage owner/repo
# JSON output (for agents)
ci-triage owner/repo --json
# Triage a specific run
ci-triage owner/repo --run 12345
# Save markdown report
ci-triage owner/repo --md triage.mdv0.2 flags
# LLM root-cause analysis (requires OPENAI_API_KEY)
ci-triage owner/repo --llm
# Override LLM model (default: gpt-4.1-mini)
ci-triage owner/repo --llm --llm-model gpt-4o-mini
# Force a specific CI provider
ci-triage owner/repo --provider gitlab
ci-triage owner/repo --provider circleci
# Show flaky tests from persistent history
ci-triage flakes owner/repo
# Print version
ci-triage --versionMulti-CI provider setup
| Provider | Token env var | Auto-detected by |
|----------|--------------|-----------------|
| GitHub Actions | gh auth login (uses gh CLI) | default |
| GitLab CI | GITLAB_TOKEN | .gitlab-ci.yml in cwd |
| CircleCI | CIRCLE_TOKEN | .circleci/ in cwd |
Auto-detection runs at startup — pass --provider to override.
GitHub Action
- uses: clankamode/ci-triage@v1
with:
token: ${{ secrets.GITHUB_TOKEN }}
flake-detect: true
comment: true
json-artifact: trueMCP Server (for coding agents)
Codex CLI (~/.codex/config.toml):
[mcp-servers.ci-triage]
command = "npx"
args = ["-y", "ci-triage", "--mcp"]Claude Code:
claude mcp add ci-triage npx -y ci-triage --mcpTools exposed:
| Tool | Description |
|------|-------------|
| triage_run | Full triage report for a CI run |
| list_failures | Recent failed runs summary |
| is_flaky | Flake history for a specific test |
| suggest_fix | Fix suggestions for failures |
SQLite Flake Database
ci-triage persists run outcomes to ~/.ci-triage/flake.db. Over time, this builds a cross-session history of test pass/fail patterns, making flake detection much more accurate than single-run heuristics.
# After a few triage runs, list known flaky tests
ci-triage flakes owner/repo
# Output:
# Test Name Fails Passes Ratio Last Seen
# ─────────────────────────────────────────────────────────────────────────────────────
# integration::auth::test_token_refresh 3 7 30.0% 2026-02-25
# e2e::dashboard::renders_on_slow_network 2 5 28.6% 2026-02-24LLM Root-Cause Analysis
When OPENAI_API_KEY is set (or --llm is passed), ci-triage calls OpenAI Responses API to produce a human-readable root cause and fix suggestions:
OPENAI_API_KEY=sk-... ci-triage owner/repo --llm
# ── LLM Root-Cause Analysis ─────────────────────────────
# Model: gpt-4.1-mini (openai)
# Root Cause: The auth token refresh test fails intermittently due to a race
# condition in the mock clock setup.
# Fix Suggestions:
# • Use vi.useFakeTimers() with explicit tick advancement
# • Add a 50ms buffer after token expiry before asserting refresh
# • Consider extracting token timing into a testable utility
# Tokens: 1240 in / 187 out — est. $0.0008
# ─────────────────────────────────────────────────────────If LLM is unavailable, ci-triage falls back to heuristic suggestions silently. The JSON output includes analysis.mode to distinguish.
Output Schema
{
"version": "1.0",
"repo": "owner/repo",
"run_id": 12345,
"run_url": "https://github.com/...",
"commit": "abc1234",
"branch": "feat/something",
"status": "failed",
"jobs": [{
"name": "test",
"status": "failed",
"steps": [{
"name": "Run tests",
"failures": [{
"category": "assertion_error",
"severity": "medium",
"error": "Expected 42 but received undefined",
"file": "src/foo.test.ts",
"line": 42,
"flaky": { "is_flaky": true, "confidence": 0.92 },
"suggested_fix": "Check null handling in Foo.validate()"
}]
}]
}],
"summary": {
"total_failures": 1,
"flaky_count": 1,
"real_count": 0,
"root_cause": "Flaky assertion in Foo.validate"
},
"analysis": {
"mode": "llm",
"provider": "openai",
"model": "gpt-4.1-mini",
"root_cause": "Race condition in mock clock setup",
"fix_suggestions": ["Use vi.useFakeTimers() with explicit tick advancement"],
"llm": {
"usage": {
"input_tokens": 1240,
"output_tokens": 187,
"estimated_cost_usd": 0.0008
}
}
}
}Failure Categories
| Category | Severity | Description |
|----------|----------|-------------|
| oom | critical | Out of memory / heap exhaustion |
| port_conflict | high | Address already in use |
| permission_error | high | EACCES / permission denied |
| module_not_found | high | Missing dependency or import |
| assertion_error | medium | Test failure / assertion mismatch |
| docker_error | high | Container / image failure |
| network_error | medium | Connection refused / timeout |
| missing_env | high | Missing environment variable |
| missing_file | medium | File or artifact not found |
| rate_limited | medium | HTTP 429 / rate limiting |
| missing_lockfile | high | Lockfile not committed |
| dependency_security | high | Vulnerable dependency |
| timeout | medium | Step or job timeout |
| lint_error | low | Linter / formatter failure |
| type_error | high | TypeScript / compilation error |
| unknown | low | Unrecognized (manual triage) |
Development
git clone https://github.com/clankamode/ci-triage
cd ci-triage
npm install
npm run build
npm testLicense
MIT
Credits
Built by Clanka ⚡ — an autonomous engineer.
