@binkley/markdown-bdd-transpiler

v0.5.5

Published

5 days ago

An AI-augmented BDD testing framework that transpiles Markdown user journeys into Playwright tests.

0High
0Medium
0Low

binkley

bdd testing playwright markdown gemini genai e2e

AI-Augmented Markdown BDD Transpiler for Testing

A modern, Behavior-Driven Development (BDD) testing framework that allows non-technical stakeholders to author End-to-End (E2E) user journeys using native Markdown.

Traditional BDD frameworks (like Cucumber) often suffer from "step-definition bloat," requiring extensive engineering work to map rigid phrases to code via Regex.' This project solves that by asking an AI to work out the correct test code from the Markdown as a semantic translation layer at compile-time. It maps human language variations to a standardized UI action manifest natively executed by Playwright and Vitest.

🌟 Key Features

Zero-Config Authoring: Test specs are written in pure Markdown (.md). No IDE plugins, custom language servers, or complex setup is required for authors. Syntax highlighting and formatting work out-of-the-box in GitHub and all major editors.
Semantic AI Translation: Users can write naturally (e.g., "click the button", "smash the button", "tap"). The transpiler uses the provider LLM to map intent to deterministic UI actions.
No Step-Definition Bloat: Generic functions for the AI to intelligently infer implicit ARIA roles from human text (e.g., classifying a step as targeting a "link" or a "checkbox").
Deterministic Caching: Compiled steps are saved to bdd-cache.json. Subsequent runs execute instantly without hitting the AI API, ensuring stable, offline, and fast CI/CD pipeline runs.
Fully Dockerized: Includes a clean Docker Compose environment to spin up the target application and execute tests in complete network isolation (preventing local EADDRINUSE port conflicts).
Visual Debugging: Automatically captures full-page Playwright screenshots whenever a test step fails, saving them locally to test-results/. The GitHub Actions CI pipeline is configured to securely upload these artifacts for easy debugging.
Precision Traceability: Every test execution and compilation warning points directly to the exact file and line number in your Markdown source, eliminating the need to debug generated code.
Production-Grade Transpiler: Structured compilation logging, concurrent API request orchestration (p-limit), high-demand automated retries, and an enforced "clean state" architecture that automatically deletes stale generated tests.

🚀 Getting Started (1-Minute Setup)

The transpiler includes an interactive initialization script to automatically scaffold your configuration and install the correct peer dependencies.

You can examine the package in NPM.

1. Install the Transpiler

npm install --save-dev @binkley/markdown-bdd-transpiler

2. Run the Initialization Wizard

npx markdown-bdd init

This interactive script will:

Ask if you want to install Playwright (@playwright/test) and automatically download the required browser binaries.
Prompt you to select your preferred AI Provider (Anthropic, Google Gemini, or OpenAI).
Generate a clean bdd.config.json tailored to your choice.
Auto-install the necessary Vercel AI SDK provider adapter (e.g., @ai-sdk/openai).

Tip: To change provider or model, you can edit bdd.config.json or you can rerun npx markdown-bdd init.

3. CI/CD Automation (Headless Setup)

If you are automating the setup in a CI/CD pipeline, you can bypass the interactive prompts by providing the -y, --provider, and --model flags:

npx markdown-bdd init -y --provider openai --model gpt-4o-mini

🏗️ Architecture

Authoring (tests/*.md): Stakeholders define features and scenarios.
Manifest (manifest.json): A JSON schema defining a highly generic set of Playwright A11y actions (e.g., interact_with_element, verify_element_state).
Transpiler (transpile.ts): Crawls markdown using the remark AST parser to track strict file locations, checks the cache, and calls the LLM provider API to map unregistered human language steps to the manifest constraints.
Standard Library (framework/standard-ui-steps.ts): The physical Playwright implementation of the manifest.
Execution (.generated/*.test.ts): The transpiler outputs standard, execution-ready Playwrite spec files.

📦 Installation & Usage in Your Project

To use the AI-Augmented Markdown BDD Transpiler in your own testing repository:

1. Install via NPM

Install the package as a development dependency:

npm install --save-dev @binkley/markdown-bdd-transpiler

Note: This package requires @playwright/test as a peer dependency. If you have not already set up Playwright in your repository, you will also need to install it and its browser binaries:

npm install --save-dev @playwright/test
npx playwright install

2. Configure and Run

Create a bdd.config.json in your project root (see the Configuration section below for options).

Then, use the included CLI to transpile your .md files into executable tests before running your test runner:

npx markdown-bdd

To see a full list of available command-line overrides and positional file targeting options, use the help flag:

npx markdown-bdd --help

Tip: We recommend adding "pretest": "markdown-bdd" to your package.json scripts.

🚀 Local Development (Contributing)

Prerequisites

Node.js (v22+ recommended)
Docker and Docker Compose
An LLM provider API.

1. Environment Setup

Export your LLM provider API key in your terminal session. An example for Google Gemini:

export GOOGLE_API_KEY="your_api_key_here"

2. Local Execution (Native)

If you prefer to run the project directly on your machine without Docker:

# Install dependencies (including Playwright browsers)
npm install

# Start the dummy frontend application in the background
npm run demo &

# Transpile the markdown tests and run them via Vitest
npm test

You can see a full list of NPM scripts with help:

npm run help
# Or npm run ?

3. Docker Execution (Recommended)

To run the application and the test suite in a clean, isolated environment, simply run:

./scripts/test-e2e.sh

Use ./scripts/test-e2e.sh --help for help.

Strict Mode (`--strict`)

By default, the transpiler emits formatting and logic warnings (e.g., missing a THEN block) without failing the build. To enforce rigid BDD hygiene in CI/CD environments, you can fail the pipeline immediately if any warnings are detected:

./scripts/test-e2e.sh --strict

As you author markdown BDD tests, you can set a higher threshold using --max-warnings=N instead, progressively lowering the number over time. Both settings can also be configured permanently in bdd.config.json.

Diagnostic Logging (`--verbose`)

If you need deeper insight into the compilation process, use the --verbose flag:

./scripts/test-e2e.sh --verbose
# Or ./scripts/validate.sh --verbose

This outputs detailed runtime diagnostics, allowing you to track exactly which files are being processed, monitor AI cache misses, and profile the latency of the LLM provider API:

📄 Transpiling tests/login-journey.md -> .generated/login-journey.md.test.ts
☁️  Cache miss: "Click the "Sign In" button"
⚡ API returned in 1.42s
📄 Transpiling tests/settings-journey.md -> .generated/settings-journey.md.test.ts

This script will:

Build the lightweight Express frontend container.
Build the Playwright test-runner container.
Automatically orchestrate network connections between them.
Execute the test suite and output the results.
Gracefully tear down the containers upon completion.

🔎 Precision Traceability

The transpiler is designed to make debugging Markdown-driven tests as intuitive as writing them:

Clickable Terminal Warnings: Structural issues (like missing GIVEN statements or unclosed code fences) are logged using modern TS ecosystem standards (e.g., ⚠️ tests/login-journey.md:10 - warning: ...). In VS Code and most modern terminal emulators, these paths are natively clickable, taking you directly to the exact line in your Markdown file.
Playwright Integration: Generated Playwright actions are wrapped in native test.step() blocks. When a test fails, Playwright's HTML report, UI Mode, and terminal logs point exactly to the human-readable Markdown step that caused the error, including the source file and line number (e.g., (login-journey.md:23)).

✅ CI/CD & Local Validation

This repository is configured to ensure code quality through rigorous static analysis and automated E2E testing using GitHub Actions.

Husky Git Hooks

To prevent broken code from being pushed to the remote repository, this project utilizes Husky hooks.

Pre-Commit (npm run validate:commit): Runs fast local checks including formatting (prettier), linting (eslint, shellcheck), type-checking (tsc), and unit tests (node:test).
Pre-Push (npm run validate:push): Runs the strict pipeline which includes all pre-commit checks, plus a dependency security audit (npm audit), and executes the full Playwright E2E suite inside Docker (test-e2e.sh).

If any of these steps fail, the git action is aborted.

Tip: If formatting fails during a commit, run npm run format to auto-fix the issues before attempting to commit again.

✍️ Writing Tests

Add new test scenarios to the tests/ directory using standard Markdown formatting. This framework is designed to be written by non-technical stakeholders in natural language.

The Anatomy of a Scenario

Every scenario should follow the standard Behavior-Driven Development (BDD) structure to ensure tests are deterministic and readable. Actionable testing steps must be wrapped in a bdd code fence and formatted as bullet points (-).

GIVEN (The Setup): Establishes the initial, immutable state of the application before the test begins. This usually involves navigating to a page or setting up prerequisites.
WHEN (The Action): Describes the specific interactions the user takes (e.g., clicking, typing, checking boxes). You can use natural language here (e.g., "Smash the button").
THEN (The Verification): Describes the expected outcome or what the user should see as a result of the WHEN actions.

Example:

# Feature: User Authentication

## Scenario: User logs in successfully

### GIVEN

```bdd
- The user navigates to "/login"
```

### WHEN

```bdd
- The user enters "frontend_wizard" into the "Username" field
- Click the "Sign In" button
```

### THEN

```bdd
- The user should see the heading "Welcome Back, Wizard!"
```

---

## Scenario: User sees error with invalid credentials

### GIVEN

```bdd
- The user navigates to "/login"
```

### WHEN

```bdd
- The user enters "bad_wizard" into the "Username" field
- Click the "Sign In" button
```

### THEN

```bdd
- Verify the "Error Message" alert is visible
```

🔄 Dynamic Data Injection

To keep secrets and environment-specific data out of your markdown files, you can use the {{VARIABLE_NAME}} syntax. During test execution, the framework will dynamically replace the placeholder with the matching environment variable (e.g., from process.env or your .env file).

Example:

- The user enters "{{TEST_USER_PASSWORD}}" into the "Password" field

or (both work):

- The user enters {{TEST_USER_PASSWORD}} into the "Password" field

If the environment variable is missing when the test runs, the test will immediately fail with a descriptive error to prevent silent UI failures.

Escaping (Literal Braces)

If you need the test to literally type curly braces (e.g., when testing a templating engine) without the framework attempting to look up a variable, escape the first brace with a backslash: \{{...}}.

Example:

- The user enters "\{{literal_string}}" into the "Code Editor"

⚠️ Structural Validation Warnings

To help keep your test suites clean and logical, the transpiler checks the sequence of your headings. If it detects a broken pattern, it will print a warning to the console during the build:

Missing an opening GIVEN: A scenario jumped straight into a WHEN action without first defining the starting state of the application.
GIVEN has no complete WHEN/THEN pair: The scenario set up the application but never performed an action or checked an outcome.
WHEN is not paired with a subsequent THEN: The scenario performed an action but never verified that the action did what it was supposed to do.

Note: You can safely interleave as much standard markdown documentation (paragraphs, images, tables) as you want between the bdd code fences. The parser will ignore everything outside the fences.

🧠 Cache Management

The transpiler ships with dedicated NPM scripts and built-in CLI flags to manage bdd-cache.json and improve the developer experience:

npm run cache:clear Instantly wipes the cache file to an empty state without invoking the AI or transpiling.
npm run cache:refresh Wipes the cache entirely and immediately transpiles all files, forcing the AI to rebuild the cache from scratch.
npm run cache:update -- <files...> Reads the current cache, forces the AI to re-evaluate the targeted files, and explicitly overwrites their cache entries while preserving untouched steps. This is perfect for surgical cache repairs (e.g., npm run cache:update -- tests/login.md).
npm run cache:ignore -- <files...> Temporarily bypasses the cache, forcing the AI to re-evaluate steps without saving them back to disk. This is a "dry-run" mode useful when testing a new LLM model without polluting your stable cache file.

Advanced Orchestration:

If you want to manage the cache and run the Playwright test suite in a single command, you can pass the underlying transpiler flags (--ignore-cache, --update-cache) directly to the E2E script using the -t or --transpiler flags:

# Update the cache for login.md and immediately run its tests in Docker
./scripts/test-e2e.sh -t update-cache tests/login.md

To perform a full cache refresh alongside Docker execution, chain the commands:

npm run cache:clear && ./scripts/test-e2e.sh

⚙️ Configuration (`bdd.config.json`)

While the init script provides a great out-of-the-box setup, the framework is fully configurable to match your project's architecture.

{
  "testDir": "tests",
  "manifestPath": "manifest.json",
  "cachePath": "bdd-cache.json",
  "outDir": ".generated",
  "banner": "test.use({ extraHTTPHeaders: { 'x-mock-user': 'admin' } });",
  "bannerFile": "tests/setup.ts",
  "strict": false,
  "maxWarnings": 5,
  "llm": {
    "provider": "gemini",
    "model": "gemini-2.5-flash-lite",
    "maxRetries": 3,
    "initialDelayMs": 1000,
    "backoffFactor": 2.0
  }
}

Configuration Options:

testDir: The directory containing your Markdown (.md) feature files. (Default: tests)
manifestPath: The path to your JSON manifest defining the available UI steps. (Default: manifest.json)
cachePath: The file where AI resolutions are deterministically cached to speed up future runs. (Default: bdd-cache.json)
outDir: The directory where the transpiler will output the generated Playwright .test.ts files. (Default: .generated)
banner: (Optional) A raw string of code injected at the top of every generated test file. (For complex setups, use bannerFile instead to avoid stringifying multiline code in JSON).
bannerFile: (Optional) Path to a TypeScript/JavaScript file (e.g., tests/setup.ts). The contents of this file are injected directly into every generated test file. This is the recommended way to inject global Playwright test.use({}) blocks to mock headers, cookies, or authentication state.
frameworkImport: (Optional) The module path injected into the generated tests. (Defaults to @binkley/markdown-bdd-transpiler/framework. Only override this if you are building custom Playwright UI implementations).
llm: Configures the third-party AI provider behavior.
- provider: The vendor to use (gemini, openai, or anthropic).
- model: The specific LLM version to use (e.g., gpt-4o-mini).
- concurrency: Max parallel network requests to the LLM. (Default: 5)
- maxRetries: Maximum number of times to retry a failed API call before crashing. (Default: 3)
- initialDelayMs: Base delay before the first retry. (Default: 1000)
- backoffFactor: Exponential multiplier for each subsequent retry. Jitter is automatically applied. (Default: 2.0)

Note: All configuration options can also be overridden via CLI flags (e.g., npx markdown-bdd-transpiler --llm-provider openai).

Extensibility: Custom UI Steps

The core framework enforces strict A11y and Playwright best practices (e.g., failing to click if an element is covered by an invisible modal overlay). Rather than polluting your BDD Markdown with technical terms (e.g., "forcefully click") or waiting for the core framework to implement edge cases, your project can easily extend the AI capabilities by defining custom steps.

For example, to cleanly handle a blocking overlay without altering the natural language of your BDD:

1. Define it in your manifest.json:

{
  "available_steps": [
    {
      "function_name": "dismiss_overlay",
      "description": "Closes a blocking modal or overlay, such as the 'End Session' warning.",
      "parameters": ["overlay_name"]
    }
  ]
}

2. Implement the workaround in your own code:

// project-root/framework/custom-ui-steps.ts
import type { Page } from '@playwright/test';

export async function dismiss_overlay(page: Page, overlay_name: string) {
  // Encapsulate the 'force' hack tailored to your specific UI problem
  await page
    .getByRole('button', { name: overlay_name })
    .click({ force: true });
}

3. Point the config to your implementation:

// bdd.config.json
{
  "manifestPath": "manifest.json",
  "frameworkImport": "./framework/custom-ui-steps.ts"
}

Now, when a non-technical author writes The user dismisses the "End Session" warning, the AI will map it to your custom function, keeping the BDD clean and the Playwright hack abstracted.

Temporary Workarounds (Designer Notes)

Sometimes, you need a test to pass today before you have time to write custom support code, or when dealing with a legacy UI element that lacks proper ARIA roles. The framework supports Designer Notes—a standard Markdown paragraph placed immediately before a bdd code fence. The transpiler sends this note directly to the AI to help it disambiguate the following steps.

If an element cannot be found by its ARIA role, you can use a Designer Note to guide the AI to use the core framework's interact_with_text step:

_QA Note: The "Submit Icon" lacks an ARIA role, but contains the visible text
"Submit"._

```bdd
- The user clicks the Submit Icon
```

⚠️ The "Technical Debt" Warning

Because Designer Notes can leak technical implementation details (or raw locators) into your behavioral specifications, the transpiler considers them a "leaky abstraction." Whenever the transpiler encounters a Designer Note, it will emit a build warning.

This is an intentional design choice to track technical debt. It allows the scenario to execute and pass in the short term, while signaling to your engineering team that technical work (like adding an ARIA role to the app, or writing a custom UI step) is required long-term to keep the test suite pure.

🛠️ Development Commands

npm run format: Runs Prettier to standardize codebase formatting.
npm run lint: Runs ESLint for TS and Shellcheck for bash files.
npm run type-check: Validates TypeScript structural integrity without emitting files.
npm run test:unit: Runs the native node:test suite with code coverage.
npm run test:e2e: Boots Docker and runs the full Playwright integration suite.
npm run cache:update: Surgically re-transpile specific files.
npm run profile: Benchmark the execution time of any arbitrary NPM command.
npm run profile:e2e: Benchmark the Dockerized E2E test pipeline.

📦 Releasing and Publishing

This project uses NPM Trusted Publishing (OIDC) via GitHub Actions. There are no hardcoded NPM tokens required to publish to the registry.

To publish a new version of the transpiler, use the included release script. This script will automatically run the tests, bump the version in package.json, create the git tag, and push to origin:

npm run release -- patch # Or minor, major

The GitHub Action will automatically trigger upon seeing the new tag, build the project, run all static analysis, and securely publish the new version to @binkley/markdown-bdd-transpiler on NPM using provenance.

📝 TODO / Future Improvements

Advanced LLM Telemetry & Interactive Feedback Loop

To transition the transpiler from a "dumb translator" to an active testing assistant, we can operationalize the metadata returned by modern LLMs:

Interactive Confidence-Gating: Parse the LLM's confidence scores. For high confidence, transpile automatically. For medium confidence, enter an interactive CLI mode to prompt the author ("Did you mean X? [Y/n]"). For low confidence, hard-fail with an actionable reason.
Automated Test Flakiness Prediction: Ask the LLM to return a flakiness_risk metric. The transpiler can proactively wrap high-risk actions in a test.step with longer timeouts or explicit .waitFor() conditions.
Smart Fallbacks (Dynamic Routing): If a fast/cheap model (like Gemini Flash) fails with a "length" finish reason or low confidence, automatically route that specific step to a larger, reasoning-focused model (e.g., Gemini 1.5 Pro).
Semantic Cache Expansion: Request the LLM to return semantic_equivalents for successful mappings (e.g., "tap the link", "select the link"). Proactively populate bdd-cache.json with these to drastically reduce API dependency over time.
Manifest Gap Analysis: Aggregate when the LLM successfully understands a step but reports a missing_capability against the provided manifest.json. Print an end-of-run report to guide maintainers on which Playwright ARIA actions to implement next (e.g., "Authors attempted drag-and-drop 14 times").

Automated Step Discovery (TypeScript AST Parsing)

Currently, when a developer writes a custom UI step, they must manually keep their TypeScript function signature synchronized with the JSON object in manifest.json. The transpiler should eventually use the TypeScript Compiler API to automatically parse the exported functions in the consumer's frameworkImport file, generating the manifest.json schema automatically.

Community Manifest Ecosystem (Plugin Architecture)

Because consuming projects can now eject their manifest.json and define custom UI steps, the core framework no longer needs to natively support every obscure ARIA role or complex interaction. Future iterations should focus on:

Manifest Modules: Allowing bdd.config.json to accept an array of manifest paths or NPM packages (e.g., "manifests": ["@binkley/bdd-salesforce-plugin"]), enabling the community to share pre-built step libraries for common SaaS platforms.
Rich Assertions: Expanding the core library for THEN verification steps to handle list lengths, exact text counts, and complex visibility states beyond simple element presence.
Compound/Parametrized Selectors: Support finding elements within other elements (e.g., "Click the 'Delete' button in the 'User Summary' row").

Interactive Manifest Upgrades

Because npx markdown-bdd init ejects a static copy of the default manifest.json into the consumer's project, they miss out on new steps added in future framework releases. We should build an interactive npx markdown-bdd upgrade command that parses the consumer's local manifest, diffs it against the latest default manifest, and interactively merges in new capabilities.

Prompt Debugging (Payload Dumping)

While the --verbose flag provides good observability into cache misses, maintainers currently lack visibility into the exact, finalized text prompt (including Designer Notes and injected variables) sent to the LLM. Implementing a --dump-prompts feature that saves the raw templated payloads to .generated/prompts/ would massively improve the ability to tune custom manifests and Designer Notes.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

AI-Augmented Markdown BDD Transpiler for Testing

🌟 Key Features

🚀 Getting Started (1-Minute Setup)

1. Install the Transpiler

2. Run the Initialization Wizard

3. CI/CD Automation (Headless Setup)

🏗️ Architecture

📦 Installation & Usage in Your Project

1. Install via NPM

2. Configure and Run

🚀 Local Development (Contributing)

Prerequisites

1. Environment Setup

2. Local Execution (Native)

3. Docker Execution (Recommended)

Strict Mode (--strict)

Diagnostic Logging (--verbose)

🔎 Precision Traceability

✅ CI/CD & Local Validation

Husky Git Hooks

✍️ Writing Tests

The Anatomy of a Scenario

🔄 Dynamic Data Injection

Escaping (Literal Braces)

⚠️ Structural Validation Warnings

🧠 Cache Management

⚙️ Configuration (bdd.config.json)

Configuration Options:

Extensibility: Custom UI Steps

Temporary Workarounds (Designer Notes)

🛠️ Development Commands

📦 Releasing and Publishing

📝 TODO / Future Improvements

Advanced LLM Telemetry & Interactive Feedback Loop

Automated Step Discovery (TypeScript AST Parsing)

Community Manifest Ecosystem (Plugin Architecture)

Interactive Manifest Upgrades

Prompt Debugging (Payload Dumping)

Strict Mode (`--strict`)

Diagnostic Logging (`--verbose`)

⚙️ Configuration (`bdd.config.json`)