selenium-flaky-detector
v1.1.0
Published
π Detect, score, and fix flaky Selenium/Java tests using entropy-based analysis, root-cause diagnosis, and a premium HTML dashboard.
Maintainers
Readme
π selenium-flaky-detector
Entropy-Based Flaky Test Detection for Selenium/Java Projects
Detect, score, and fix flaky Selenium tests with a premium interactive dashboard, root-cause analysis, and smart fix recommendations.
π§ What is Entropy-Based Detection?
A flaky test is a test that sometimes passes and sometimes fails without any code changes. This is extremely common in Selenium/Java due to async logic, variable network latency, and DOM rendering "race conditions."
Traditional testing tools just tell you: "Test A Failed". But if it failed 1 out of 3 times, how bad is the flakiness?
This detector handles that by using an Information Theory formula (Entropy) to calculate a true Flakiness Percentage rather than just giving you pass/fail states:
- 0% Entropy (Total Order): Tests that Always Pass or Always Fail are predictable and get a 0% flaky score.
- 100% Entropy (Maximum Chaos): A test that passes exactly 50% of the time is a true coin flip, destroying CI/CD trust, and scores 100%.
By focusing on entropy, we ignore "consistently broken" tests (which are just regular bugs) and surgically spotlight the truly chaotic timing and network issues.
β‘ Simplified Usage (The 3-Step Guide)
If you just want to get started immediately, follow these three simple steps:
1οΈβ£ Step 1: Install the Plugin
Open your terminal and install the tool globally so you can use it anywhere:
npm install -g selenium-flaky-detector2οΈβ£ Step 2: Run the Command
Navigate to your Maven/Gradle project folder and trigger the detector:
# Recommended for development (runs 3 times)
npx selenium-flaky-detect --runs 33οΈβ£ Step 3: See the Reports
Once finished, an interactive HTML Dashboard will automatically pop up in your browser. Review your "Flakiness Score" and apply the smart fix recommendations!
ποΈ Architecture
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β π LAYER 1 Β· Orchestration β
β β
β ββββββββββββββββ ββββββββββββββββββββ ββββββββββββββββ β
β β Your Project βββββββΆβ βοΈ Orchestrator βββββββΆβ β Maven β β
β β (Maven/Gradle)β β Engine β β Runner β β
β β β β Command β β (Surefire) β β
β ββββββββββββββββ ββββββββββββββββββββ ββββββββββββββββ β
ββββββββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββ
β
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β π LAYER 2 Β· Test Execution Loop β
β β
β ββββββββββββββββ ββββββββββββββββββββ ββββββββββββββββ β
β β π’ Multi-Run βββββββΆβ π Surefire XML βββββββΆβ π₯ Failure β β
β β Manager β β Aggregation β β Capture β β
β β (N Repeats) β β (JUnit XML) β β Engine β β
β ββββββββββββββββ ββββββββββββββββββββ ββββββββββββββββ β
ββββββββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββ
β
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β π§ LAYER 3 Β· Intelligence & Scoring β
β β
β ββββββββββββββββ ββββββββββββββββββββ ββββββββββββββββββββ β
β β π Entropy β β π Root Cause β β π― Health Score β β
β β Scorer β β Analyzer β β Calculator β β
β β (0β100%) β β (Auto-Diag) β β (0β100) β β
β ββββββββββββββββ ββββββββββββββββββββ ββββββββββββββββββββ β
ββββββββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββ
β
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β π LAYER 4 Β· Actionable Reporting β
β β
β ββββββββββββββββ ββββββββββββββββββββ ββββββββββββββββ β
β β π₯οΈ InteractiveβββββββΆβ π‘ Fix Advice βββββββΆβ π¦ CI Trust β β
β β Dashboard β β (Smart RCA) β β Gate β β
β ββββββββββββββββ ββββββββββββββββββββ ββββββββββββββββ β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββπ§© Understanding the Layers
- Layer 1: Orchestration. Handles the CLI orchestrator to launch your test suite. It manages the lifecycle of the Maven/Gradle runner.
- Layer 2: Test Execution Loop. Executes your Selenium test suite multiple times (
Nrepeats) to gather a reliable sample size. It continuously parses the generatedSurefire XMLreports, capturing every failure trace, error message, and test duration. - Layer 3: Intelligence & Scoring. The brain of the detector. The Entropy Scorer mathematically calculates a test's exact
Flakiness Percentage(0β100%) and generates a globalSuite Health Score. Next, the Root Cause Analyzer scans the failed Java stack traces, pattern-matching against known Selenium exceptions to categorize exactly why it failed (e.g.,StaleElementReferenceException). - Layer 4: Actionable Reporting. Translates the raw data into an interactive HTML dashboard. Recommends specific, pattern-based Java/Selenium code fixes (e.g., βAdd explicit wait hereβ) based on the identified root cause. Optionally acts as a CI Trust Gate to aggressively block builds if flaky tests cross a configured threshold.
π Step-by-Step Guide
1. Install the CLI Globally
To use the tool across any project on your machine, install it globally via npm:
npm install -g selenium-flaky-detector2. Verify Installation
Ensure the CLI is installed correctly by checking the version or help menu:
selenium-flaky-detect --help3. Prepare Your Java Project
The detector relies on Maven Surefire to generate XML test reports. You must ensure your pom.xml is configured to not fail the build immediately when a test fails, so all tests can finish.
Add this property to your maven-surefire-plugin configuration:
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
<version>3.2.5</version>
<configuration>
<!-- Crucial: Let the build succeed even if tests fail -->
<testFailureIgnore>true</testFailureIgnore>
</configuration>
</plugin>4. Run the Detector on Your Project
Navigate to your project's root directory (where the pom.xml lives) and run the npx command:
# Recommended for development (runs 3 times)
npx selenium-flaky-detect --runs 3[!TIP] Choosing the right run count:
--runs 2: Quick sanity check after applying a fix.--runs 3: (Default) Balanced speed and accuracy for local dev.--runs 5+: Recommended for CI/CD to catch rare, elusive flakes.
If your terminal is somewhere else, you can provide the absolute path to your Java project:
npx selenium-flaky-detect --project /Users/mvsaran/my-java-app --runs 5If you only want to analyze a specific subset of test classes (to save time):
npx selenium-flaky-detect --runs 3 --spec "LoginTest,CheckoutTest"5. Review the Premium Report
Once all runs are complete, the tool will automatically open a highly interactive HTML dashboard in your default browser:
- Identify tests with a
Flakiness Scorebetween 1% and 99%. - Analyze the automatically generated Root Cause (RCA) tags (e.g., Timeout, Stale Element).
- Fix the tests using the recommended code suggestions provided for each specific RCA.
π 6. Fix, Re-Run, and Verify
This is the most important step! After you apply a fix to your Java files:
- Do NOT run
mvn testdirectly (it will fail on the first error and stop). - Instead, re-run the detector using
npx:npx selenium-flaky-detect --runs 3. - The tool will automatically handle the errors, finish the runs, and update your Health Score to show that the test is now π’ STABLE.
π¦ Installation
# Global CLI (recommended)
npm install -g selenium-flaky-detector
# Or use directly with npx (no install needed)
npx selenium-flaky-detect --helpRequirements:
- Node.js β₯ 14
- Java 17+
- Maven or Gradle
- Google Chrome + ChromeDriver (auto-managed by WebDriverManager)
βοΈ CLI Options
| Option | Default | Description |
|---|---|---|
| --runs <n> | 3 | Number of times to repeat the test suite |
| --project <path> | . | Path to the Maven/Gradle project |
| --output <path> | ./flaky-report | Output directory for the HTML report |
| --spec <pattern> | (all) | Test class filter (e.g. *Login*) |
| --threshold <n> | 70 | CI gate minimum health score (0β100) |
| --no-open | (auto-open) | Skip auto-opening the HTML report |
| --demo | β | Run the built-in ShopFlake Java demo |
π§© Programmatic API
const { FlakyDetector } = require('selenium-flaky-detector');
const detector = new FlakyDetector({
runs: 3,
projectPath: './my-java-project',
outputDir: './flaky-report',
specPattern: '**/LoginTest*',
ciThreshold: 80,
openReport: true,
buildTool: 'maven', // or 'gradle' β auto-detected by default
});
const result = await detector.run();
console.log(result.healthScore); // 0β100
console.log(result.passed); // true if healthScore >= threshold
console.log(result.scores); // per-test flakiness scores
console.log(result.analysis); // root cause analysis
console.log(result.reportPath); // absolute path to HTML reportπ Entropy-Based Flakiness Scoring
ββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Flakiness = 4 Γ passRate Γ (1 β passRate) Γ 100 β
ββββββββββββββββββββββββββββββββββββββββββββββββββββ| Score | Meaning | Indicator | |---|---|---| | 0% | Stable β always passes OR always fails | π’ | | 1β49% | Mildly flaky | π‘ | | 50β79% | Moderately flaky | π | | 80β99% | Severely flaky | π΄ | | 100% | Perfectly flaky β exact 50/50 split | π |
π Root Cause Analysis
The engine auto-classifies Selenium failures:
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Selenium RCA Pattern Engine β
β β
β StaleElementReferenceException βββΆ β»οΈ Stale Element β
β TimeoutException βββΆ β±οΈ Timeout β
β NoSuchElementException βββΆ π Missing Element β
β ElementNotInteractableException βββΆ ποΈ Element Not Ready β
β AssertionError / TestNG Assert βββΆ β‘ Async Load β
β SocketException / ConnectError βββΆ π Network / Connection β
β WebDriverException βββΆ π WebDriver Instabilityβ
β ConfigurationFailure βββΆ π§ Config Failure β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββπ οΈ Smart Fix Recommendations
β»οΈ Fix: StaleElementReferenceException
// β Problem: Cached element goes stale after DOM update
WebElement btn = driver.findElement(By.id("submit"));
waitFor(someCondition);
btn.click(); // StaleElementReferenceException!
// β
Fix: Re-fetch element just before interaction
waitFor(someCondition);
driver.findElement(By.id("submit")).click();β±οΈ Fix: TimeoutException / Hard Waits
// β Problem: Hard-coded sleep is fragile
Thread.sleep(3000);
driver.findElement(By.id("product-list")).click();
// β
Fix: Explicit wait for element to be clickable
WebDriverWait wait = new WebDriverWait(driver, Duration.ofSeconds(15));
WebElement el = wait.until(ExpectedConditions.elementToBeClickable(By.id("product-list")));
el.click();β‘ Fix: AssertionError on Async Count
// β Problem: Count assertion before async load completes
List<WebElement> products = driver.findElements(By.className("product-card"));
assertEquals(12, products.size());
// β
Fix: Wait for expected count first
WebDriverWait wait = new WebDriverWait(driver, Duration.ofSeconds(10));
List<WebElement> products = wait.until(
ExpectedConditions.numberOfElementsToBe(By.className("product-card"), 12)
);
assertEquals(12, products.size());π Project Structure
selenium-flaky-detector/
β
βββ π¦ package.json # npm configuration & CLI bin entry
βββ π run-demo.js # One-click demo orchestrator
β
βββ bin/
β βββ π» flaky-detect.js # CLI entry point (global binary)
β
βββ lib/
β βββ π index.js # Public API β FlakyDetector class
β βββ βοΈ orchestrator.js # Layer 1: Demo lifecycle manager
β βββ π’ runner.js # Layer 2: Multi-run Maven/Gradle executor
β βββ π parser.js # Layer 2: Surefire XML report parser
β βββ π scorer.js # Layer 3: Entropy scorer + health score
β βββ π analyzer.js # Layer 3: Root cause analyzer (7 patterns)
β βββ π₯οΈ reporter.js # Layer 4: Premium HTML dashboard generator
β
βββ demo-app/ # β Java Spring Boot ShopFlake Demo
βββ pom.xml # Maven config (Selenium 4, JUnit 5)
βββ src/
βββ main/java/io/shopflake/
β βββ ShopFlakeApplication.java # Spring Boot entry point
β βββ controller/
β βββ ShopController.java # Page routes (/, /cart, /deals)
β βββ ApiController.java # REST API (7 flakiness sources!)
βββ main/resources/
β βββ application.properties
β βββ templates/
β βββ index.html # π Product grid (flaky: async load)
β βββ cart.html # π Cart (flaky: 30% stale data)
β βββ deals.html # β‘ Deals (maximally flaky: 50/50)
βββ test/java/io/shopflake/tests/
βββ ShopFlakeBaseTest.java # Shared WebDriver setup
βββ ProductLoadingTest.java # π‘ FLAKY: async timing
βββ CartFunctionalityTest.java # π FLAKY: race conditions
βββ FlashDealsTest.java # π΄ VERY FLAKY: 50/50
βββ StableControlTest.java # π’ STABLE: control group
βββ SearchAndSessionTest.java # π‘ MIXED: some flakyπ Report Features
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β π FLAKY TEST REPORT Health: 68/100 β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ£
β π― Suite Health Score ββββββββββββββββββ 68 / 100 β
β π₯ Pass/Fail Heatmap [Run 1][Run 2][Run 3][Run 4][Run 5] β
β π‘ Precision Recommendations 7 recommendations found β
β π·οΈ Root Cause Labels β±Timeout Β· β»Stale Β· β
Reliable β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| Feature | Description | |---|---| | π― Suite Health Score | Overall reliability index from 0β100 (animated ring) | | π₯ Pass/Fail Heatmap | Visual grid β β/β per test per run | | π‘ Precision Recommendations | Specific, actionable Java/Selenium fix snippets | | π·οΈ Root Cause Labels | Auto-tags: Timeout, Stale Element, Reliable Pass, etc. | | π¦ CI Trust Gate | Hard pass/fail with configurable threshold |
π€ CI/CD Integration
# GitHub Actions example
- name: Run Flaky Detection
run: npx selenium-flaky-detect --runs 3 --threshold 75
# Exits with code 1 if health score < 75 (blocks merge)// package.json scripts
{
"scripts": {
"flaky:check": "selenium-flaky-detect --runs 3 --threshold 70"
}
}π Framework Configuration
The detector supports both JUnit 5 and TestNG projects via Maven Surefire or Gradle.
Maven (JUnit 5 / TestNG)
Ensure your pom.xml generates standard reports:
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
<version>3.2.5</version>
<configuration>
<!-- Never stop on failures β let detector analyze all results -->
<testFailureIgnore>true</testFailureIgnore>
<!-- JUnit XML report location (Automatically parsed) -->
<reportsDirectory>${project.build.directory}/surefire-reports</reportsDirectory>
<!-- TestNG Specific: If using testng.xml -->
<!-- <suiteXmlFiles><suiteXmlFile>src/test/resources/testng.xml</suiteXmlFile></suiteXmlFiles> -->
</configuration>
</plugin>TestNG results.xml (Native)
We also support parsing the native testng-results.xml file if your configuration generates it! Simply point the detector to the directory containing this file.
π§ Troubleshooting & Common Issues
β Error: "Requires a project to execute but there is no POM in this directory"
This happens when Maven cannot find your pom.xml file.
Resolution Steps:
- Check your path: Ensure you are running the command in the exact folder that contains
pom.xml. - Subdirectories: If your Java project is in a subfolder (e.g.,
eclipse-workspace/MyProject), you must point to that folder:npx selenium-flaky-detect --project "./MyProject" --runs 3 - Use Absolute Paths: If relative paths are failing, use the full path to the project:
npx selenium-flaky-detect --project "C:\Users\Name\eclipse-workspace\TestNGFramework" --runs 3 - Confirm Build Tool: The tool auto-detects Maven (
pom.xml) and Gradle (build.gradle). Ensure one of these files exists in the target directory.
β οΈ Error: "No XML reports found"
This usually means the tests didn't run at all or failed to generate reports.
Resolution Steps:
- Run manually first: Try running
mvn testin your project folder to ensure Maven is installed and your tests are valid. - Check
targetfolder: Ensure Maven is generating XML files intarget/surefire-reports. - Check Plugin Config: Ensure
testFailureIgnoreis set totruein yourpom.xml(see Step 3 in the Step-by-Step guide).
π Links
π License
MIT Β© SeleniumFlaky Contributors
