cypress-flaky-detector-pro
v1.0.4
Published
A powerful zero-config flaky test detector for Cypress with entropy-based scoring and automated root cause analysis.
Downloads
435
Maintainers
Readme
╔═══════════════════════════════════════════════════════════════════╗
║ ║
║ ███████╗██╗ █████╗ ██╗ ██╗██╗ ██╗ ║
║ ██╔════╝██║ ██╔══██╗██║ ██╔╝╚██╗ ██╔╝ ║
║ █████╗ ██║ ███████║█████╔╝ ╚████╔╝ ║
║ ██╔══╝ ██║ ██╔══██║██╔═██╗ ╚██╔╝ ║
║ ██║ ███████╗██║ ██║██║ ██╗ ██║ ║
║ ╚═╝ ╚══════╝╚═╝ ╚═╝╚═╝ ╚═╝ ╚═╝ ║
║ ║
║ T E S T D E T E C T O R · P R O ║
║ ║
╚═══════════════════════════════════════════════════════════════════╝Hunt down unreliable tests. Eliminate false confidence. Ship with certainty.
⚡ A complete Cypress-powered flaky test detection system
Demo e-commerce app · Intentionally flaky tests · Rich interactive HTML report
📌 Table of Contents
- 🏗️ Architecture Overview
- 📁 Project Structure
- 🧠 How It Works
- ⚡ Performance
- 🚀 Quick Start
- 🔌 Integration Guide
- 📊 Report Features
- 🛠️ Typical Fixes
🏗️ Architecture Overview
The system follows a multi-layered architecture designed to simulate, detect, and report on test flakiness in a controlled environment.
┌──────────────────────────────────────────────────────────────────────┐
│ 🎭 LAYER 1 · Orchestration & Simulation │
│ │
│ ┌──────────────┐ ┌──────────────────┐ ┌──────────────┐ │
│ │ 🛒 ShopFlake │─────▶│ ⚙️ Orchestrator │─────▶│ 🌲 Cypress │ │
│ │ Demo App │ │ Engine │ │ Runner │ │
│ │ (Sync/Async) │ │ (run-demo.js) │ │ (Headless) │ │
│ └──────────────┘ └──────────────────┘ └──────────────┘ │
└──────────────────────────────────┬───────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────────────┐
│ 🔁 LAYER 2 · Test Execution Loop │
│ │
│ ┌──────────────┐ ┌──────────────────┐ ┌──────────────┐ │
│ │ 🔢 Multi-Run │─────▶│ 📄 JUnit Result │─────▶│ 💥 Failure │ │
│ │ Manager │ │ Aggregation │ │ Capture │ │
│ │ (N Repeats) │ │ (XML / JSON) │ │ Engine │ │
│ └──────────────┘ └──────────────────┘ └──────────────┘ │
└──────────────────────────────────┬───────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────────────┐
│ 🧠 LAYER 3 · Intelligence & Scoring │
│ │
│ ┌──────────────┐ ┌──────────────────┐ ┌──────────────────┐ │
│ │ 📊 Entropy │ │ 🔍 Root Cause │ │ 💯 Health Score │ │
│ │ Scorer │ │ Analyzer │ │ Calculator │ │
│ │ (0–100%) │ │ (Auto-Diag) │ │ (0–100) │ │
│ └──────────────┘ └──────────────────┘ └──────────────────┘ │
└──────────────────────────────────┬───────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────────────┐
│ 📋 LAYER 4 · Actionable Reporting │
│ │
│ ┌──────────────┐ ┌──────────────────┐ ┌──────────────┐ │
│ │ 🖥️ Interactive│─────▶│ 💡 Fix Advice │─────▶│ 🚦 CI Trust │ │
│ │ Dashboard │ │ (AI-Powered) │ │ Gate │ │
│ └──────────────┘ └──────────────────┘ └──────────────┘ │
└──────────────────────────────────────────────────────────────────────┘🎭 Layer 1 — Orchestration & Simulation
| Component | Role |
|-----------|------|
| 🛒 ShopFlake Demo App | Target app with intentional flakiness: network delays, race conditions, transient elements |
| ⚙️ Orchestrator Engine (run-demo.js) | Manages full lifecycle — starts backend, triggers detection, launches report |
| 🌲 Cypress Runner | Executes the test suite headlessly for consistent execution |
🔁 Layer 2 — Test Execution Loop
| Component | Role |
|-----------|------|
| 🔢 Multi-Run Manager | Repeats the suite N times — flakiness is statistical, needing multiple passes |
| 📄 JUnit Result Aggregation | Converts fragile console logs into stable, structured XML after each run |
| 💥 Failure Capture Engine | Grabs error messages and pinpoints exactly where in the test lifecycle failures occur |
🧠 Layer 3 — Intelligence & Scoring
| Component | Role | |-----------|------| | 📊 Entropy Scorer | Calculates flakiness percentage based on pass/fail variance across all runs | | 🔍 Root Cause Analyzer | Pattern-matches error logs to auto-diagnose Timeouts, Race Conditions, or Async Mis-timing | | 💯 Health Score Calculator | Produces a single 0–100 reliability metric for the entire suite |
📋 Layer 4 — Actionable Reporting
| Component | Role | |-----------|------| | 🖥️ Interactive Dashboard | Premium-styled HTML report for visual inspection of test trends | | 💡 Fix Advice | Specific, actionable code recommendations based on detected root cause | | 🚦 CI Trust Gate | Determines if the suite is reliable enough to serve as a merge gate |
📁 Project Structure
FlakyTestPredictor/
│
├── 🗂️ demo-app/
│ ├── 🌐 index.html # ShopFlake e-commerce demo app (flakiness sources)
│ └── 🖥️ server.js # Express server with intentional random delays
│
├── 🧪 cypress/
│ ├── e2e/
│ │ ├── 🟡 01-product-loading.cy.js # Async timing tests [FLAKY]
│ │ ├── 🟠 02-cart-functionality.cy.js # Race conditions [FLAKY]
│ │ ├── 🟢 03-search-and-filter.cy.js # Mix: stable + flaky [MIXED]
│ │ ├── 🔴 04-flash-deals.cy.js # Heavy async [VERY FLAKY]
│ │ └── 🟡 05-ui-stability.cy.js # Shifting elements & race [FLAKY]
│ ├── support/
│ │ └── ⚙️ e2e.js # Global setup & exception handlers
│ └── results/ # 📦 JUnit XML output (auto-generated)
│
├── 📊 flaky-report/ # HTML report output (auto-generated)
│
├── 🧠 flaky-detector.js # Core detection engine (scoring & RCA)
├── 🚀 run-demo.js # One-click orchestrator
└── ⚙️ cypress.config.js # Cypress config (optimized timeouts)🧠 How It Works
┌─────────────────────────────────────────────────────────────────┐
│ │
│ RUN 1 ──▶ [✓ ✓ ✗ ✓ ✗] │
│ RUN 2 ──▶ [✓ ✗ ✓ ✓ ✓] ──▶ 📊 Entropy Score ──▶ 🏥 RCA │
│ RUN 3 ──▶ [✓ ✓ ✓ ✗ ✓] │
│ │
└─────────────────────────────────────────────────────────────────┘📊 Entropy-Based Flakiness Scoring
We use a statistical flakiness score calculated as:
╔══════════════════════════════════════════════════╗
║ Flakiness = 4 × passRate × (1 − passRate) × 100 ║
╚══════════════════════════════════════════════════╝| Score | Meaning | Indicator |
|-------|---------|-----------|
| 0% | Stable — always passes OR always fails | 🟢 |
| 1–49% | Mildly flaky | 🟡 |
| 50–79% | Moderately flaky | 🟠 |
| 80–99% | Severely flaky | 🔴 |
| 100% | Perfectly flaky — exact 50/50 split | 💀 |
🔍 Root Cause Analysis (RCA)
The engine analyzes error messages to auto-categorize failures:
┌────────────────────────────────────────────────────────────────┐
│ Error Pattern Matching Engine │
│ │
│ "Timed out..." ──▶ ⏱️ Timeout │
│ "element not visible" ──▶ 👁️ Race Condition │
│ "Promise not resolved" ──▶ ⚡ Async Load │
│ "list.length mismatch" ──▶ 🔢 DOM Check │
└────────────────────────────────────────────────────────────────┘⚡ Performance Optimizations
✅ Optimized to run in 60–90 seconds per run — total ~160s for 3 runs
| Optimization | Detail |
|---|---|
| ⏩ Reduced App Delays | Background operations sped up 4× to minimize idle time |
| 💨 Tightened Timeouts | defaultCommandTimeout set to 2000ms for fast-fail behavior |
| 📋 Source-of-Truth Reporting | Terminal summary reads directly from JUnit XML for 100% accuracy |
🚀 Quick Start
▶️ Run the Full Demo
# ✅ Recommended: starts app, runs detector, and opens the report
$env:RUNS="3"; node run-demo.js🔧 Manual Commands
# 🖥️ Start just the app
npm run start:app
# 🔍 Run the detector manually
RUNS=3 node flaky-detector.js
# 📊 View the latest report
npm run report🔌 Implementation Guide for Any Project
Drop the Cypress Flaky Detector into any existing Cypress project in minutes.
Step 1 · 📦 Installation
npm install --save-dev cypress-flaky-detector-pro
npm install --save-dev chalk@4 fs-extra xml2jsStep 2 · 🔧 Automatic Setup
npm install --save-dev cypress-multi-reporters mocha-junit-reporter xml2js fs-extra chalkStep 3 · ⚙️ Configuration
Ensure your cypress.config.js generates JUnit reports:
// cypress.config.js
module.exports = defineConfig({
e2e: {
reporter: 'junit',
reporterOptions: {
mochaFile: 'cypress/results/[suiteName].xml',
toConsole: false,
},
// ...other config
}
});Step 4 · 🚀 Running the Detector
# Run with 3 repeats (standard)
$env:RUNS="3"; npx flaky-detect
# Run specific tests only
$env:SPEC="cypress/e2e/login/*.cy.js"; npx flaky-detectStep 5 · 🤖 Automated CI Integration
// package.json
"scripts": {
"flaky:check": "cross-env RUNS=3 flaky-detect"
}📊 Report Features
╔══════════════════════════════════════════════════════════════════╗
║ 📋 FLAKY TEST REPORT Health: 74/100 ║
╠══════════════════════════════════════════════════════════════════╣
║ 💯 Suite Health Score ██████████████░░░░ 74 / 100 ║
║ 🔥 Pass/Fail Heatmap [Run 1][Run 2][Run 3][Run 4][Run 5] ║
║ 🤖 AI Recommendations 3 actionable fixes found ║
║ 🏷️ Root Cause Labels ⏱Timeout · 👁Race · ⚡Async ║
╚══════════════════════════════════════════════════════════════════╝| Feature | Description |
|---------|-------------|
| 💯 Suite Health Score | Overall reliability index from 0–100 |
| 🔥 Pass/Fail Heatmap | Visual grid showing failure patterns across runs |
| 🤖 AI Recommendation Engine | Actionable suggestions to fix specific flakiness |
| 🏷️ Root Cause Labels | Auto-tags failures as Timeout, Race Condition, or Async Load |
🛠️ Typical Fixes — AI-Powered Recommendations
The detector identifies these patterns and suggests targeted fixes:
⏱️ Fix: Regex Assertion Instability
// ❌ Problem: Regex match directly on element
cy.get('#timer').should('match', /\d+/);
// ✅ Fix: Invoke text first for stability
cy.get('#timer').invoke('text').should('match', /\d+/);⏳ Fix: Hard Waits
// ❌ Problem: Brittle fixed-time wait
cy.wait(5000);
// ✅ Fix: Dynamic assertion with custom timeout
cy.get('.loading-spinner', { timeout: 10000 }).should('not.exist');🌐 Fix: Unmocked Network Calls
// ❌ Problem: Test depends on real network timing
cy.visit('/products');
cy.get('.product-card').should('have.length', 12);
// ✅ Fix: Intercept and control the response
cy.intercept('GET', '/api/products', { fixture: 'products.json' }).as('getProducts');
cy.visit('/products');
cy.wait('@getProducts');
cy.get('.product-card').should('have.length', 12);━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Built with 🧠 intelligence · 🔬 rigor · ☕ caffeine
Author: Sran Kumar
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━Stop trusting green builds. Start trusting your tests.
