elasticdash-test
v0.1.26
Published
AI-native test runner for ElasticDash workflow testing
Downloads
10,472
Readme
ElasticDash Test
An AI-native test runner for ElasticDash workflow testing. Built for async AI pipelines — not a general-purpose test runner.
Quick Links
Jump to Key Sections
Open Detailed Docs
- Quick Start Guide ← Start here to set up your first workflow
- Test Writing Guidelines
- Test Matchers
- Tool Recording and Replay
- Workflows Dashboard
- Agent Mid-Trace Replay
- Deno Support
- Instrumentation Guide — how to write
ed_tools.ts,ed_workflows.ts,ed_agents.ts - Langfuse Trace Structure — span structure required for dashboard replay
Features
- 🎯 Trace-first testing — every test gets a
tracecontext to record and assert on LLM calls and tool invocations - 🔍 Automatic AI interception — captures OpenAI, Gemini, and Grok calls without code changes
- 🧪 AI-specific matchers — semantic output matching, LLM-judged evaluations, prompt assertions
- 🛠️ Tool & LLM recording & replay — automatically trace tool and AI calls with checkpoint-based replay and mock support
- 📊 Interactive dashboard — browse workflows, debug traces, validate fixes visually
- 🤖 Agent mid-trace replay — resume long-running agents from any task without re-execution
- 🌐 HTTP workflow mode — run workflows against your live dev server for framework-heavy apps (Next.js, Remix, etc.) with full AI and tool call observability
- 🚀 CI/CD runner — fetch test groups from your ElasticDash project, execute tests, submit results, and fail the build on regressions
Installation
npm install elasticdash-testRequirements: Node 20+. For Deno projects, see Using elasticdash-test in Deno.
Setup with a Coding Agent
If you use a coding agent (Claude Code, Cursor, Copilot, Codex, Windsurf, etc.), tell your agent:
Integrate elasticdash-test into this project.
Read node_modules/elasticdash-test/docs/agent-coding-instructions.md for how to proceed,
and node_modules/elasticdash-test/docs/agent-integration-guide.md for technical reference.Your agent will read both docs and handle the full setup — creating ed_tools.ts, ed_workflows.ts, updating source files, and validating the connection.
Optional: To copy the agent instructions into your project for easier access:
npx elasticdash init-guide # creates AGENTS.md
npx elasticdash init-guide --target CLAUDE.md # for Claude Code
npx elasticdash init-guide --target .cursor/rules/elasticdash.md # for Cursor
npx elasticdash init-guide --target .github/copilot-instructions.md # for CopilotIf the target file already exists, the guide is appended (not overwritten). Use --force to replace the file entirely.
Cloud Setup
Add these to your .env (or CI secrets):
ELASTICDASH_API_URL=https://server.elasticdash.com
ELASTICDASH_API_KEY=ed_your_api_key_hereELASTICDASH_API_URL— The ElasticDash cloud backend URL. For cloud users this is alwayshttps://server.elasticdash.com. For self-hosted instances, use your own backend URL.ELASTICDASH_API_KEY— Your project API key. Find it in the ElasticDash dashboard under project settings.
Note:
ELASTICDASH_SERVERis an alias forELASTICDASH_API_URL. Both work — the SDK checksELASTICDASH_API_URLfirst, then falls back toELASTICDASH_SERVER.
Git ignore: ElasticDash writes temporary runtime artifacts under .temp/. Add this to your .gitignore:
.temp/Running CLI commands: Use npx to run commands with your locally installed version (recommended to avoid version drift):
npx elasticdash test
npx elasticdash dashboardAlternatively, install globally if you prefer shorter commands:
npm install -g elasticdash-test
elasticdash test
elasticdash dashboardQuick Start
1. Write a test file (my-flow.ai.test.ts):
import '../node_modules/elasticdash-test/dist/test-setup.js'
import { expect } from 'expect'
aiTest('checkout flow', async (ctx) => {
await runCheckout(ctx)
expect(ctx.trace).toHaveLLMStep({ model: 'gpt-4', contains: 'order confirmed' })
expect(ctx.trace).toCallTool('chargeCard')
})2. Run it:
npx elasticdash test # discover all * *.ai.test.ts files
npx elasticdash test ./ai-tests # discover in a specific directory
npx elasticdash run my-flow.ai.test.ts # run a single file3. Read the output:
✓ checkout flow (1.2s)
✗ refund flow (0.8s)
→ Expected tool "chargeCard" to be called, but no tool calls were recorded
2 passed
1 failed
Total: 3
Duration: 3.4sWorkflow export requirements (subprocess mode):
- Export plain callable functions from
ed_workflows.ts/js. - Use JSON-serializable inputs/outputs (object or array) so dashboard replay can pass args and read results.
- Do not export framework-bound handlers directly (for example Next.js
NextRequest/NextResponseroute handlers) — use HTTP workflow mode instead.
Documentation
Core Concepts
- Test Writing Guidelines — comprehensive guide to writing AI workflow tests
- Test Matchers — all available matchers with examples
- Tool Recording & Replay — automatic tool tracing and checkpoint-based replay
Advanced Features
- Workflows Dashboard — interactive workflow browser, debugger, and fetching traces from Langfuse
- Agent Mid-Trace Replay — resume long-running agents from any task
- Deno Support — using ElasticDash Test in Deno projects
Integration & Reference
- Instrumentation Guide — how to write
ed_tools.ts,ed_workflows.ts, anded_agents.tsto connect your production code to ElasticDash - Integration Guide — step-by-step SDK integration reference (templates, patterns, decision trees)
- Agent Coding Instructions — behavioral instructions for AI coding agents performing the integration
- Langfuse Trace Structure — Langfuse span structure required for dashboard replay and tool-level diffing
Quick Reference
Test Globals
| Global | Description |
|---|---|
| aiTest(name, fn) | Register a test |
| beforeAll(fn) | Run once before all tests in the file |
| beforeEach(fn) | Run before every test in the file |
| afterEach(fn) | Run after every test in the file (runs even if test fails) |
| afterAll(fn) | Run once after all tests in the file |
Recording Trace Data
Automatic (recommended): Workflow code making real API calls to OpenAI, Gemini, or Grok is automatically intercepted and recorded.
Manual (for custom providers or mocks):
ctx.trace.recordLLMStep({
model: 'gpt-4',
prompt: 'What is the order status?',
completion: 'The order has been confirmed.',
})
ctx.trace.recordToolCall({
name: 'chargeCard',
args: { amount: 99.99 },
})
ctx.trace.recordCustomStep({
kind: 'rag',
name: 'pokemon-search',
payload: { query: 'pikachu' },
result: { ids: [25] },
})Common Matchers
// Assert LLM calls
expect(ctx.trace).toHaveLLMStep({ model: 'gpt-4' })
expect(ctx.trace).toHaveLLMStep({ promptContains: 'order status' })
// Assert tool calls
expect(ctx.trace).toCallTool('chargeCard')
// Semantic output matching (LLM-judged)
expect(ctx.trace).toMatchSemanticOutput('order confirmed')
// Custom steps (RAG, code, fixed)
expect(ctx.trace).toHaveCustomStep({ kind: 'rag', name: 'pokemon-search' })→ See Test Matchers for complete documentation
Automatic AI & Tool Tracing
AI Interception
The runner automatically intercepts and records calls to:
- OpenAI (
api.openai.com) - Gemini (
generativelanguage.googleapis.com) - Grok/xAI (
api.x.ai)
No code changes needed — just run your workflow and assertions work automatically.
Tool Recording
Recommended: wrapTool wraps a tool function and automatically records its name, input, output, duration, and any streaming output. Works in both subprocess mode and HTTP mode:
import { wrapTool } from 'elasticdash-test'
import { runSelectQuery } from './services/dataService'
export const dataService = wrapTool('dataService', async (input: { query: string }) => {
return await runSelectQuery(input.query)
})Manual pattern (legacy): isolate tracing in the service .then/.catch path so tracing failures never block business logic:
import { runSelectQuery } from './services/dataService'
export const dataService = async (input: any) => {
const { query } = input as { query: string }
return await runSelectQuery(query)
.then(async (res: any) => {
try {
const { recordToolCall } = await import('elasticdash-test')
recordToolCall('dataService', input, res)
} catch {
// tracing must never block the main service path
}
return res
})
.catch(async (err: any) => {
try {
const { recordToolCall } = await import('elasticdash-test')
recordToolCall('dataService', input, err)
} catch {
// tracing must never block the main service path
}
throw err
})
}In manual mode, always isolate tracing in a separate try/catch so trace logging errors cannot interrupt core service execution.
→ See Tool Recording & Replay for checkpoint-based replay and freezing
AI Call Recording
wrapAI wraps any AI call function and records its name, input, output, duration, and token usage (auto-detected for Anthropic, OpenAI, and Gemini SDK responses):
import { wrapAI } from 'elasticdash-test'
import Anthropic from '@anthropic-ai/sdk'
const client = new Anthropic()
export const callClaude = wrapAI('claude-sonnet-4-5', async (messages: Anthropic.MessageParam[]) => {
return await client.messages.create({
model: 'claude-sonnet-4-5-20250929',
max_tokens: 1024,
messages,
})
})Use wrapAI when you have a custom AI wrapper or a provider not covered by automatic interception. For direct OpenAI/Anthropic/Gemini SDK calls inside a subprocess workflow, automatic interception via installAIInterceptor already handles recording without any code changes.
AI mocking (subprocess / test runner mode): wrapAI also checks resolveAIMock at call time, so the dashboard can mock LLM responses the same way it mocks tool calls — without modifying your server code. Configure an AIMockConfig in the dashboard UI or pass it programmatically via the aiMockConfig option when running a workflow.
HTTP Streaming Capture and Replay
ElasticDash also captures non-AI fetch responses that stream over HTTP (for example SSE and NDJSON endpoints) in the HTTP interceptor.
Currently detected as streaming when response content-type includes:
text/event-streamapplication/x-ndjsonapplication/stream+jsonapplication/jsonl
How it behaves today:
- During live execution, ElasticDash tees the response stream and returns a real stream to your app code.
- In parallel, ElasticDash buffers the recorder side of the stream as raw text for trace replay.
- During replay, ElasticDash reconstructs a stream from that captured raw payload and restores status, status text, and response headers.
Replay fidelity note:
- Replay preserves stream payload content, but not original chunk boundaries or timing cadence.
Minimal stream consumption example:
const res = await fetch('https://example.com/events')
if (!res.body) throw new Error('Expected a streaming response body')
const reader = res.body.getReader()
const decoder = new TextDecoder()
let buffer = ''
for (;;) {
const { done, value } = await reader.read()
if (done) break
buffer += decoder.decode(value, { stream: true })
}
buffer += decoder.decode()→ See Quick Start Guide for end-to-end setup guidance
HTTP Workflow Mode
For apps where subprocess import fails (Next.js, Remix, SvelteKit, etc.), configure workflows to call your running dev server directly instead of importing the handler:
// elasticdash.config.ts
export default {
testMatch: ['**/*.ai.test.ts'],
workflows: {
runChat: {
mode: 'http',
url: 'http://localhost:3001/api/chat',
method: 'POST',
headers: {
'Content-Type': 'application/json',
'x-user-id': '{{env.DEV_USER_ID}}',
},
bodyTemplate: {
messages: [{ role: 'user', content: '{{input.message}}' }],
selectedModel: 'claude-sonnet-4-5-20250929',
},
responseFormat: 'vercel-ai-stream',
},
},
}To enable full AI and tool call observability in HTTP mode, install elasticdash-test in your app:
// app/api/chat/route.ts
import { initHttpRunContext, wrapTool, wrapAI } from 'elasticdash-test'
export async function POST(req: Request) {
const runId = req.headers.get('x-elasticdash-run-id')
const serverUrl = req.headers.get('x-elasticdash-server')
if (runId && serverUrl) {
await initHttpRunContext(runId, serverUrl)
}
// ... rest of handler
}The dashboard injects x-elasticdash-run-id and x-elasticdash-server headers automatically when triggering a run. initHttpRunContext fetches any frozen steps from the dashboard before execution begins — this is what enables step freezing (replaying historical results for specific steps). Every wrapAI and wrapTool call downstream pushes telemetry events back to the dashboard in real time.
Note: Use
setHttpRunContext(synchronous) if you only need observability and do not need step freezing.initHttpRunContextis required for the dashboard's breakpoint/replay functionality to work.
Dashboard Auto-Detection (env var mode)
As an alternative to calling initHttpRunContext in your request handler, you can set two environment variables before starting your server or script. Every wrapTool and wrapAI call will then connect to the dashboard automatically — no code changes needed:
# Required: URL of the running ElasticDash dashboard
ELASTICDASH_SERVER=http://localhost:4573
# Optional: pre-registered run ID to fetch frozen steps for
ELASTICDASH_RUN_ID=<run-id-from-dashboard>- If only
ELASTICDASH_SERVERis set, a fresh run ID is generated and all calls push live telemetry to the dashboard (observability only, no step freezing). - If both variables are set, frozen steps are fetched from the dashboard at startup and replayed as configured.
- If the dashboard is unreachable the SDK falls through to live execution silently.
- The initialization runs once per process — subsequent
wrapTool/wrapAIcalls reuse the cached context.
This mode is intended for local development and testing scenarios. For production HTTP servers with concurrent requests, continue using initHttpRunContext inside your request handler.
Subprocess vs HTTP mode comparison:
| | Subprocess (default) | HTTP mode |
|---|---|---|
| Works with simple apps | Yes | Yes |
| Works with Next.js / Remix | No | Yes |
| Requires dev server running | No | Yes |
| App code changes needed | Extract handler to ed_workflows.ts | Add initHttpRunContext to request handler (or use env vars for auto-detect) |
| AI / tool call observability | Automatic via interceptors | Via wrapAI / wrapTool push |
| Step freezing / breakpoints | Yes | Yes (initHttpRunContext, or ELASTICDASH_SERVER + ELASTICDASH_RUN_ID env vars) |
| LLM response mocking | Yes (via aiMockConfig) | Yes (via frozen AI events) |
CI/CD Runner
Run your ElasticDash test groups directly from CI pipelines. The ci command fetches active test groups from your project via API key, executes each test locally, submits results back to the backend, and exits with code 1 if any test fails.
How It Works
┌──────────────┐ GET /testgroups/by-project ┌──────────────────┐
│ CI Runner │ ──────────────────────────────────→ │ ElasticDash API │
│ (SDK side) │ ←────────────────────────────────── │ (your backend) │
│ │ test groups + tests + expectations │ │
│ │ │ │
│ execute │ POST /testgroups/:id/runs │ │
│ each test │ ──────────────────────────────────→ │ stores results │
│ locally │ │ │
│ │ POST /testgroups/batches │ │
│ │ ──────────────────────────────────→ │ groups the runs │
└──────────────┘ └──────────────────┘- Fetch — Calls
GET /testgroups/by-projectwith the API key (scoped to project). Returns all active test groups with their tests and expectations. - Execute — For each test, runs it locally using existing SDK infrastructure:
- Single-step tests — replays a specific tool or AI step with
mock_inputandfrozen_events - Full-flow tests — runs the entire workflow from
ed_workflows.tswithworkflow_input
- Single-step tests — replays a specific tool or AI step with
- Evaluate — Checks all expectations (token-budget, latency-budget, output-contains, output-schema, tool-called, determinism, llm-judge). Respects
run_countandpass_threshold. - Submit — POSTs each result to
POST /testgroups/:id/runswith single run data, expectation results, and git metadata. - Batch — Creates a batch grouping all run IDs for dashboard viewing.
CLI Usage
# Basic — uses env vars (set in .env or CI secrets)
npx elasticdash ci
# Explicit flags (if not using env vars)
npx elasticdash ci --server https://server.elasticdash.com --api-key ed_xxx
# Filter by workflow or tags
npx elasticdash ci --server $ELASTICDASH_API_URL --api-key $ELASTICDASH_API_KEY \
--workflow checkout --tags payment,critical
# Pass git metadata (auto-detected in GitHub Actions / GitLab CI)
npx elasticdash ci --server $ELASTICDASH_API_URL --api-key $ELASTICDASH_API_KEY \
--git-branch main --git-commit abc123All flags:
| Flag | Env Var | Description |
|------|---------|-------------|
| --server <url> | ELASTICDASH_API_URL | Backend API URL (required) |
| --api-key <key> | ELASTICDASH_API_KEY | Project API key (required) |
| --workflow <name> | — | Filter test groups by workflow name |
| --tags <t1,t2> | — | Filter test groups by tags (comma-separated) |
| --triggered-by <src> | — | Trigger source label (default: ci) |
| --git-branch <branch> | Auto-detected | Git branch name |
| --git-commit <sha> | Auto-detected | Git commit SHA |
| --git-commit-message <msg> | Auto-detected | Commit message |
| --git-pr-number <n> | Auto-detected | PR number |
| --git-pr-url <url> | Auto-detected | PR URL |
GitHub Actions Example
name: AI Tests
on: [push, pull_request]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: 20
- run: npm ci
- name: Run ElasticDash CI tests
run: npx elasticdash ci
env:
ELASTICDASH_API_URL: ${{ secrets.ELASTICDASH_API_URL }}
ELASTICDASH_API_KEY: ${{ secrets.ELASTICDASH_API_KEY }}
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }} # if tests use OpenAIGit branch, commit SHA, PR number, and PR URL are auto-detected from GitHub Actions environment variables — no extra flags needed.
GitLab CI Example
ai-tests:
stage: test
image: node:20
script:
- npm ci
- npx elasticdash ci
variables:
ELASTICDASH_API_URL: $ELASTICDASH_API_URL
ELASTICDASH_API_KEY: $ELASTICDASH_API_KEYProgrammatic Usage
import { runCI } from 'elasticdash-test'
const summary = await runCI({
serverUrl: 'https://your-api.com',
apiKey: 'ed_xxx',
workflowName: 'checkout', // optional filter
tags: ['payment', 'critical'], // optional filter
})
console.log(`${summary.passed}/${summary.total} passed`)
process.exit(summary.failed > 0 ? 1 : 0)Output
[elasticdash ci] Fetching test groups...
[elasticdash ci] Found 2 test group(s), 5 test(s) total.
Checkout Flow (3 tests)
validate-input ... PASS (234ms)
charge-card ... PASS (1823ms)
send-confirmation ... FAIL (945ms)
[output-contains] Output text check failed.
Refund Flow (2 tests)
check-eligibility ... PASS (412ms)
process-refund ... PASS (1567ms)
──────────────────────────────────────────────────
Summary
──────────────────────────────────────────────────
Total: 5
Passed: 4
Failed: 1
Duration: 5.0s
Batch ID: 42
──────────────────────────────────────────────────
[elasticdash ci] 1 test(s) failed.Prerequisites
- An ElasticDash project with an API key (create one in the dashboard under Settings → API Keys)
- Active test groups with tests and expectations configured in the dashboard
ed_tools.tsand/ored_workflows.tsin your project root (for the executor to discover tools and workflows)- AI provider API keys in the environment if tests use LLM calls (e.g.,
OPENAI_API_KEY)
Configuration
Optional elasticdash.config.ts at project root:
export default {
testMatch: ['**/*.ai.test.ts'],
traceMode: 'local' as const,
}Dashboard port: defaults to 4573. Override via CLI flag or .env:
# .env
ELASTICDASH_PORT=5000# or CLI flag
npx elasticdash dashboard --port 5000Optional project file: ed_workers.ts can be used by your app architecture (for example, exporting worker handlers), but it is not required or discovered by the ElasticDash CLI/dashboard.
TypeScript Setup
For typed globals and matchers, extend your test directory's tsconfig.json:
{
"extends": "../tsconfig.json",
"include": ["../src/**/*", "./**/*"]
}Programmatic API
import { runFiles, reportResults, registerMatchers, installAIInterceptor } from 'elasticdash-test'
registerMatchers()
installAIInterceptor()
const results = await runFiles(['./tests/flow.ai.test.ts'])
reportResults(results)HTTP mode context (call inside your request handler):
import { initHttpRunContext, setHttpRunContext } from 'elasticdash-test'
// Async — fetches frozen steps from dashboard to enable step freezing/breakpoints
await initHttpRunContext(runId, dashboardUrl)
// Synchronous alternative — observability only, no step freezing
setHttpRunContext(runId, dashboardUrl)Dashboard auto-detection (env var mode — no code changes needed):
# Set before starting your server or script
ELASTICDASH_API_URL=https://server.elasticdash.com # cloud (or http://localhost:4573 for local dashboard)
ELASTICDASH_API_KEY=ed_your_api_key_here # your project API key
ELASTICDASH_RUN_ID=<run-id-from-dashboard> # optional, enables step freezingwrapTool and wrapAI will auto-connect on their first call. See Dashboard Auto-Detection for details.
CI runner (execute test groups from your project):
import { runCI } from 'elasticdash-test'
const summary = await runCI({ serverUrl: 'https://server.elasticdash.com', apiKey: 'ed_xxx' })
// summary.total, summary.passed, summary.failed, summary.batchId, summary.resultsLicense
MIT
