cypress-flaky-detector-pro

v1.0.4

Published

3 months ago

A powerful zero-config flaky test detector for Cypress with entropy-based scoring and automated root cause analysis.

Downloads

0High
0Medium
0Low

saran.mv

cypress testing flaky-tests qa automation ci-gate

╔═══════════════════════════════════════════════════════════════════╗
║                                                                   ║
║    ███████╗██╗      █████╗ ██╗  ██╗██╗   ██╗                     ║
║    ██╔════╝██║     ██╔══██╗██║ ██╔╝╚██╗ ██╔╝                     ║
║    █████╗  ██║     ███████║█████╔╝  ╚████╔╝                      ║
║    ██╔══╝  ██║     ██╔══██║██╔═██╗   ╚██╔╝                       ║
║    ██║     ███████╗██║  ██║██║  ██╗   ██║                        ║
║    ╚═╝     ╚══════╝╚═╝  ╚═╝╚═╝  ╚═╝   ╚═╝                        ║
║                                                                   ║
║         T E S T   D E T E C T O R   ·   P R O                    ║
║                                                                   ║
╚═══════════════════════════════════════════════════════════════════╝

Hunt down unreliable tests. Eliminate false confidence. Ship with certainty.

View on GitHub

⚡ A complete Cypress-powered flaky test detection system

Demo e-commerce app · Intentionally flaky tests · Rich interactive HTML report

📌 Table of Contents

🏗️ Architecture Overview

The system follows a multi-layered architecture designed to simulate, detect, and report on test flakiness in a controlled environment.

┌──────────────────────────────────────────────────────────────────────┐
│  🎭  LAYER 1 · Orchestration & Simulation                            │
│                                                                      │
│   ┌──────────────┐      ┌──────────────────┐      ┌──────────────┐  │
│   │  🛒 ShopFlake │─────▶│ ⚙️  Orchestrator  │─────▶│ 🌲 Cypress   │  │
│   │   Demo App   │      │  Engine          │      │   Runner     │  │
│   │ (Sync/Async) │      │  (run-demo.js)   │      │ (Headless)   │  │
│   └──────────────┘      └──────────────────┘      └──────────────┘  │
└──────────────────────────────────┬───────────────────────────────────┘
                                   │
                                   ▼
┌──────────────────────────────────────────────────────────────────────┐
│  🔁  LAYER 2 · Test Execution Loop                                   │
│                                                                      │
│   ┌──────────────┐      ┌──────────────────┐      ┌──────────────┐  │
│   │ 🔢 Multi-Run  │─────▶│ 📄 JUnit Result  │─────▶│ 💥 Failure   │  │
│   │   Manager    │      │  Aggregation     │      │   Capture    │  │
│   │ (N Repeats)  │      │  (XML / JSON)    │      │   Engine     │  │
│   └──────────────┘      └──────────────────┘      └──────────────┘  │
└──────────────────────────────────┬───────────────────────────────────┘
                                   │
                                   ▼
┌──────────────────────────────────────────────────────────────────────┐
│  🧠  LAYER 3 · Intelligence & Scoring                                │
│                                                                      │
│   ┌──────────────┐    ┌──────────────────┐    ┌──────────────────┐  │
│   │ 📊 Entropy    │    │ 🔍 Root Cause    │    │ 💯 Health Score  │  │
│   │   Scorer     │    │   Analyzer       │    │   Calculator     │  │
│   │  (0–100%)    │    │  (Auto-Diag)     │    │   (0–100)        │  │
│   └──────────────┘    └──────────────────┘    └──────────────────┘  │
└──────────────────────────────────┬───────────────────────────────────┘
                                   │
                                   ▼
┌──────────────────────────────────────────────────────────────────────┐
│  📋  LAYER 4 · Actionable Reporting                                  │
│                                                                      │
│   ┌──────────────┐      ┌──────────────────┐      ┌──────────────┐  │
│   │ 🖥️ Interactive│─────▶│ 💡 Fix Advice    │─────▶│ 🚦 CI Trust  │  │
│   │   Dashboard  │      │  (AI-Powered)    │      │   Gate       │  │
│   └──────────────┘      └──────────────────┘      └──────────────┘  │
└──────────────────────────────────────────────────────────────────────┘

🎭 Layer 1 — Orchestration & Simulation

| Component | Role | |-----------|------| | 🛒 ShopFlake Demo App | Target app with intentional flakiness: network delays, race conditions, transient elements | | ⚙️ Orchestrator Engine (run-demo.js) | Manages full lifecycle — starts backend, triggers detection, launches report | | 🌲 Cypress Runner | Executes the test suite headlessly for consistent execution |

🔁 Layer 2 — Test Execution Loop

| Component | Role | |-----------|------| | 🔢 Multi-Run Manager | Repeats the suite N times — flakiness is statistical, needing multiple passes | | 📄 JUnit Result Aggregation | Converts fragile console logs into stable, structured XML after each run | | 💥 Failure Capture Engine | Grabs error messages and pinpoints exactly where in the test lifecycle failures occur |

🧠 Layer 3 — Intelligence & Scoring

| Component | Role | |-----------|------| | 📊 Entropy Scorer | Calculates flakiness percentage based on pass/fail variance across all runs | | 🔍 Root Cause Analyzer | Pattern-matches error logs to auto-diagnose Timeouts, Race Conditions, or Async Mis-timing | | 💯 Health Score Calculator | Produces a single 0–100 reliability metric for the entire suite |

📋 Layer 4 — Actionable Reporting

| Component | Role | |-----------|------| | 🖥️ Interactive Dashboard | Premium-styled HTML report for visual inspection of test trends | | 💡 Fix Advice | Specific, actionable code recommendations based on detected root cause | | 🚦 CI Trust Gate | Determines if the suite is reliable enough to serve as a merge gate |

📁 Project Structure

FlakyTestPredictor/
│
├── 🗂️  demo-app/
│   ├── 🌐 index.html              # ShopFlake e-commerce demo app (flakiness sources)
│   └── 🖥️  server.js              # Express server with intentional random delays
│
├── 🧪 cypress/
│   ├── e2e/
│   │   ├── 🟡 01-product-loading.cy.js    # Async timing tests       [FLAKY]
│   │   ├── 🟠 02-cart-functionality.cy.js  # Race conditions          [FLAKY]
│   │   ├── 🟢 03-search-and-filter.cy.js   # Mix: stable + flaky      [MIXED]
│   │   ├── 🔴 04-flash-deals.cy.js         # Heavy async              [VERY FLAKY]
│   │   └── 🟡 05-ui-stability.cy.js        # Shifting elements & race [FLAKY]
│   ├── support/
│   │   └── ⚙️  e2e.js             # Global setup & exception handlers
│   └── results/                   # 📦 JUnit XML output (auto-generated)
│
├── 📊 flaky-report/                # HTML report output (auto-generated)
│
├── 🧠 flaky-detector.js            # Core detection engine (scoring & RCA)
├── 🚀 run-demo.js                  # One-click orchestrator
└── ⚙️  cypress.config.js           # Cypress config (optimized timeouts)

🧠 How It Works

 ┌─────────────────────────────────────────────────────────────────┐
 │                                                                 │
 │   RUN 1 ──▶ [✓ ✓ ✗ ✓ ✗]                                       │
 │   RUN 2 ──▶ [✓ ✗ ✓ ✓ ✓]   ──▶  📊 Entropy Score  ──▶  🏥 RCA  │
 │   RUN 3 ──▶ [✓ ✓ ✓ ✗ ✓]                                       │
 │                                                                 │
 └─────────────────────────────────────────────────────────────────┘

📊 Entropy-Based Flakiness Scoring

We use a statistical flakiness score calculated as:

╔══════════════════════════════════════════════════╗
║   Flakiness  =  4 × passRate × (1 − passRate) × 100  ║
╚══════════════════════════════════════════════════╝

| Score | Meaning | Indicator | |-------|---------|-----------| | 0% | Stable — always passes OR always fails | 🟢 | | 1–49% | Mildly flaky | 🟡 | | 50–79% | Moderately flaky | 🟠 | | 80–99% | Severely flaky | 🔴 | | 100% | Perfectly flaky — exact 50/50 split | 💀 |

🔍 Root Cause Analysis (RCA)

The engine analyzes error messages to auto-categorize failures:

┌────────────────────────────────────────────────────────────────┐
│  Error Pattern Matching Engine                                 │
│                                                                │
│  "Timed out..."           ──▶  ⏱️  Timeout                    │
│  "element not visible"    ──▶  👁️  Race Condition             │
│  "Promise not resolved"   ──▶  ⚡  Async Load                  │
│  "list.length mismatch"   ──▶  🔢  DOM Check                  │
└────────────────────────────────────────────────────────────────┘

⚡ Performance Optimizations

✅ Optimized to run in 60–90 seconds per run — total ~160s for 3 runs

| Optimization | Detail | |---|---| | ⏩ Reduced App Delays | Background operations sped up 4× to minimize idle time | | 💨 Tightened Timeouts | defaultCommandTimeout set to 2000ms for fast-fail behavior | | 📋 Source-of-Truth Reporting | Terminal summary reads directly from JUnit XML for 100% accuracy |

🚀 Quick Start

▶️ Run the Full Demo

# ✅ Recommended: starts app, runs detector, and opens the report
$env:RUNS="3"; node run-demo.js

🔧 Manual Commands

# 🖥️  Start just the app
npm run start:app

# 🔍 Run the detector manually
RUNS=3 node flaky-detector.js

# 📊 View the latest report
npm run report

🔌 Implementation Guide for Any Project

Drop the Cypress Flaky Detector into any existing Cypress project in minutes.

Step 1 · 📦 Installation

npm install --save-dev cypress-flaky-detector-pro
npm install --save-dev chalk@4 fs-extra xml2js

Step 2 · 🔧 Automatic Setup

npm install --save-dev cypress-multi-reporters mocha-junit-reporter xml2js fs-extra chalk

Step 3 · ⚙️ Configuration

Ensure your cypress.config.js generates JUnit reports:

// cypress.config.js
module.exports = defineConfig({
  e2e: {
    reporter: 'junit',
    reporterOptions: {
      mochaFile: 'cypress/results/[suiteName].xml',
      toConsole: false,
    },
    // ...other config
  }
});

Step 4 · 🚀 Running the Detector

# Run with 3 repeats (standard)
$env:RUNS="3"; npx flaky-detect

# Run specific tests only
$env:SPEC="cypress/e2e/login/*.cy.js"; npx flaky-detect

Step 5 · 🤖 Automated CI Integration

// package.json
"scripts": {
  "flaky:check": "cross-env RUNS=3 flaky-detect"
}

📊 Report Features

╔══════════════════════════════════════════════════════════════════╗
║  📋  FLAKY TEST REPORT                           Health: 74/100  ║
╠══════════════════════════════════════════════════════════════════╣
║  💯  Suite Health Score     ██████████████░░░░  74 / 100         ║
║  🔥  Pass/Fail Heatmap      [Run 1][Run 2][Run 3][Run 4][Run 5]  ║
║  🤖  AI Recommendations     3 actionable fixes found             ║
║  🏷️  Root Cause Labels      ⏱Timeout · 👁Race · ⚡Async         ║
╚══════════════════════════════════════════════════════════════════╝

| Feature | Description | |---------|-------------| | 💯 Suite Health Score | Overall reliability index from 0–100 | | 🔥 Pass/Fail Heatmap | Visual grid showing failure patterns across runs | | 🤖 AI Recommendation Engine | Actionable suggestions to fix specific flakiness | | 🏷️ Root Cause Labels | Auto-tags failures as Timeout, Race Condition, or Async Load |

🛠️ Typical Fixes — AI-Powered Recommendations

The detector identifies these patterns and suggests targeted fixes:

⏱️ Fix: Regex Assertion Instability

// ❌ Problem: Regex match directly on element
cy.get('#timer').should('match', /\d+/);

// ✅ Fix: Invoke text first for stability
cy.get('#timer').invoke('text').should('match', /\d+/);

⏳ Fix: Hard Waits

// ❌ Problem: Brittle fixed-time wait
cy.wait(5000);

// ✅ Fix: Dynamic assertion with custom timeout
cy.get('.loading-spinner', { timeout: 10000 }).should('not.exist');

🌐 Fix: Unmocked Network Calls

// ❌ Problem: Test depends on real network timing
cy.visit('/products');
cy.get('.product-card').should('have.length', 12);

// ✅ Fix: Intercept and control the response
cy.intercept('GET', '/api/products', { fixture: 'products.json' }).as('getProducts');
cy.visit('/products');
cy.wait('@getProducts');
cy.get('.product-card').should('have.length', 12);

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  Built with 🧠 intelligence · 🔬 rigor · ☕ caffeine
           Author: Sran Kumar
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Stop trusting green builds. Start trusting your tests.