@phoenixaihub/flake-finder
v0.1.0
Published
Detect, score, and quarantine flaky tests in your CI pipeline
Maintainers
Readme
flake-finder
Detect, score, and quarantine flaky tests in your CI pipeline.
flake-finder tracks test results over time and uses Bayesian statistics + change-point detection to identify flaky tests — tests that fail non-deterministically without any code change.
Features
- 📥 Ingest JUnit XML, Jest JSON, pytest JSON, or generic JSON
- 📊 Score each test with a Bayesian flakiness score (0–100)
- 🔍 Distinguish regression from flakiness via CUSUM change-point detection
- ⚖️ Weight recent results more heavily via exponential decay (14-day half-life)
- 🚫 Quarantine flaky tests with ready-to-use config for Jest, pytest, and JUnit
- 🤖 CI-native commands: ingest, check (exit 1), and GitHub PR comment generation
Install
npm install -g @phoenixaihub/flake-finder
# or as a dev dependency:
npm install -D @phoenixaihub/flake-finderRequires Node.js 18+.
Quick Start
1. Track test results
# JUnit XML (Java, Go, Python, Rust...)
flake-finder track results.xml
# Jest JSON (jest --json)
jest --json --outputFile results.json
flake-finder track results.json --format jest
# pytest (pytest-json-report)
pytest --json-report --json-report-file=results.json
flake-finder track results.json --format pytestRun this after every CI build. Results accumulate in .flake-finder/results.db.
2. View the flakiness report
flake-finder report
# Only show tests with score > 20
flake-finder report --threshold 20
# Output as Markdown
flake-finder report --format markdownSample output:
🔍 Flaky Test Report
┌─────────────────────────────────────────────────┬───────┬───────────┬──────┬───────┬──────────────┐
│ Test │ Score │ Fail Rate │ Runs │ Fails │ Change Point │
├─────────────────────────────────────────────────┼───────┼───────────┼──────┼───────┼──────────────┤
│ LoginPage > handles session timeout │ 72.4 │ 68.0% │ 25 │ 17 │ flaky │
│ PaymentService#processRefund │ 48.1 │ 42.0% │ 12 │ 5 │ ⚠ regression │
│ UserAuth > validates expired token │ 31.2 │ 28.0% │ 18 │ 5 │ flaky │
│ …SearchController#testPaginationEdgeCase │ 12.7 │ 11.0% │ 27 │ 3 │ flaky │
└─────────────────────────────────────────────────┴───────┴───────────┴──────┴───────┴──────────────┘
4 flaky test(s) found
Score: 0-100 (higher = flakier) | ⚠ = regression detected, not pure flakiness3. Generate quarantine config
# Show all formats
flake-finder quarantine --threshold 20
# Dry run to preview
flake-finder quarantine --threshold 20 --dry-runOutput includes:
📦 Jest (--testPathIgnorePatterns):
--testPathIgnorePatterns \
"LoginPage",
"UserAuth"
🐍 pytest (-k exclusion):
pytest -k 'not "LoginPage > handles session timeout" and not "UserAuth > validates expired token"'
☕ JUnit (@Ignore annotations):
@Ignore("flaky: score=72.4")
// LoginPage > handles session timeout4. Stats dashboard
flake-finder stats📊 Flake-Finder Stats Dashboard
Total tests tracked: 142
Total test runs: 38
Total results: 5,396
Flaky tests: 7 / 142 (threshold: 10)
Date range: 12/15/2023 → 1/15/2024
🔥 Top 10 Flakiest Tests:
███████░░░ 72.4 LoginPage > handles session timeout
████░░░░░░ 48.1 PaymentService#processRefund
███░░░░░░░ 31.2 UserAuth > validates expired token
...CI Integration
GitHub Actions
# .github/workflows/test.yml
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with: { node-version: '20' }
- run: npm ci && npm test -- --json --outputFile results.json
- name: Ingest flake results
run: npx flake-finder ci ingest results.json
env:
GITHUB_SHA: ${{ github.sha }}
- name: Check for flaky tests
run: npx flake-finder ci check --threshold 25
- name: Post PR comment
if: github.event_name == 'pull_request'
run: |
npx flake-finder ci comment > /tmp/flake-comment.md
gh pr comment ${{ github.event.pull_request.number }} --body-file /tmp/flake-comment.md
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}CircleCI
jobs:
test:
steps:
- checkout
- run: npm test -- --json --outputFile results.json
- run: |
npx flake-finder ci ingest results.json
npx flake-finder ci check --threshold 25Persisting the database across runs
For the flakiness scores to improve over time, persist the .flake-finder/ directory as a CI cache:
GitHub Actions:
- uses: actions/cache@v4
with:
path: .flake-finder
key: flake-finder-${{ runner.os }}How It Works
Bayesian Flakiness Score
Each test gets a Beta distribution as its failure rate posterior:
- Prior: Beta(1, 1) — uninformed, assumes nothing
- For each test result: add weight to α (failure) or β (pass)
- Results are exponentially decayed by age (14-day half-life by default)
- Score = E[failure_rate] × confidence_weight × 100
This means:
- A test that fails once in 100 runs scores ~1–2
- A test that fails 5 times in 10 runs scores ~40–60
- A test that always fails scores close to 100 (high confidence)
Change-Point Detection (CUSUM)
Uses Page's CUSUM algorithm to detect if a test has transitioned from passing to failing (a regression), vs. randomly flipping (flakiness):
- Encodes pass=0, fail=1
- Accumulates deviations from baseline failure rate
- If cumulative sum exceeds threshold → change point detected
- Tests with change points are flagged
⚠ regressionin reports
Exponential Decay
Results decay with half-life of 14 days:
weight = 2^(-(age_days / 14))A test fixed 2 weeks ago contributes half as much signal as a recent result.
Configuration
All commands accept --db <path> to override the default .flake-finder/results.db location.
Environment variables respected during ci ingest:
GITHUB_SHAorGIT_COMMIT— auto-attached as commit SHAGITHUB_RUN_IDorCI_RUN_ID— auto-attached as run ID
Contributing
See CONTRIBUTING.md.
License
MIT © PhoenixAI Hub
