elasticdash-sdk

v0.2.9

Published

7 days ago

AI-native SDK for ElasticDash workflow testing, tracing, and observability

0High
0Medium
0Low

ElasticDash SDK

An AI-native test runner for ElasticDash workflow testing. Built for async AI pipelines — not a general-purpose test runner.

Quick Links

Jump to Key Sections

Open Detailed Docs

Quick Start Guide ← Start here to set up your first workflow
Test Writing Guidelines
Test Matchers
Tool Recording and Replay
Workflows Dashboard
Agent Mid-Trace Replay
Deno Support
Instrumentation Guide — how to write ed_tools.ts, ed_workflows.ts, ed_agents.ts
Langfuse Trace Structure — span structure required for dashboard replay

Features

🎯 Trace-first testing — every test gets a trace context to record and assert on LLM calls and tool invocations
🔍 Automatic AI interception — captures OpenAI, Anthropic, Gemini, Grok, Kimi, and AWS Bedrock calls without code changes
🧪 AI-specific matchers — semantic output matching, LLM-judged evaluations, prompt assertions
🛠️ Tool & LLM recording & replay — automatically trace tool and AI calls with checkpoint-based replay and mock support
📊 Interactive dashboard — browse workflows, debug traces, validate fixes visually
🤖 Agent mid-trace replay — resume long-running agents from any task without re-execution
🌐 HTTP workflow mode — run workflows against your live dev server for framework-heavy apps (Next.js, Remix, etc.) with full AI and tool call observability
🚀 CI/CD runner — fetch test groups from your ElasticDash project, execute tests, submit results, and fail the build on regressions

Installation

npm install elasticdash-sdk

Requirements: Node 20+. For Deno projects, see Using elasticdash-sdk in Deno.

Setup with a Coding Agent (required after install)

npm install alone wires nothing — the SDK only starts capturing traces once ed_tools.ts and ed_workflows.ts exist and the init call runs. Use a coding agent to do that wiring. The smoothest path is two extra steps after install:

Step 2 — bake the integration guide into your project so the coding agent always sees it:

npx elasticdash init-guide --target CLAUDE.md           # Claude Code
npx elasticdash init-guide --target AGENTS.md           # default — works for Codex, Windsurf, others
npx elasticdash init-guide --target .cursor/rules/elasticdash.md           # Cursor
npx elasticdash init-guide --target .github/copilot-instructions.md        # Copilot

If the target file already exists, the guide is appended (not overwritten). Use --force to replace the file entirely. Pick the target your agent actually reads; one file is enough.

Step 3 — tell your coding agent:

Complete the elasticdash-sdk integration following the guide that was just added to this project.

That's it. The agent reads the baked-in guide (which transcludes the same content as node_modules/elasticdash-sdk/docs/agent-coding-instructions.md and agent-integration-guide.md), then creates ed_tools.ts, ed_workflows.ts, calls edInitObservability from the entry point, updates source files to route tool calls through ed_tools, and validates the connection.

Do not shortcut this step. Without ed_tools.ts and ed_workflows.ts plus the init call, the SDK does not intercept tool or AI calls — your project will run without errors and produce zero traces. A vague prompt like "install elasticdash-sdk" lets the agent stop at npm install; the prompt above is explicit about completing integration.

Init must go through edInitObservability (the helper inside ed_workflows.ts), not import { initObservability } from 'elasticdash-sdk' in your entry file. Both files in the integration share one CJS module instance via createRequire(import.meta.url); importing initObservability directly hits a different ESM instance, leaving _ed.startTrace reading from an empty store. The symptom is [elasticdash] startTrace: observability not initialised at runtime. The integration guide's Step 3 explains why; the edInitObservability helper is the only correct path. For CLI scripts, also call edShutdownObservability() from a finally block at process exit — the SDK's auto-registered exit hooks are async and short-lived processes can terminate before the final batch flushes.

Important: do not use eval('require') to load the SDK in ed_tools.ts. The eval('require')(...) trick that older versions of this guide recommended works only in CJS — in any project with "type": "module" in package.json, it throws "require is not defined", the catch silently swallows the error, and the entire integration no-ops with zero logs and zero traces. Use createRequire(import.meta.url) from node:module instead; it works in both ESM and CJS.

Fallback — if you don't want to add a file to your repo, you can skip init-guide and use this prompt instead, which directs the agent at the docs inside node_modules/:

Integrate elasticdash-sdk into this project.
Read node_modules/elasticdash-sdk/docs/agent-coding-instructions.md for how to proceed,
and node_modules/elasticdash-sdk/docs/agent-integration-guide.md for technical reference.

This works but is more fragile — relies on the agent following the doc-reading instruction literally, and breaks if a different agent picks up the project later without the same prompt.

Cloud Setup

Add these to your .env (or CI secrets):

ELASTICDASH_API_URL=https://server.elasticdash.com
ELASTICDASH_API_KEY=ed_your_api_key_here

ELASTICDASH_API_URL — The ElasticDash cloud backend URL. For cloud users this is always https://server.elasticdash.com. For self-hosted instances, use your own backend URL.
ELASTICDASH_API_KEY — Your project API key. Find it in the ElasticDash dashboard under project settings.

Note: ELASTICDASH_SERVER is an alias for ELASTICDASH_API_URL. Both work — the SDK checks ELASTICDASH_API_URL first, then falls back to ELASTICDASH_SERVER.

Git ignore: ElasticDash writes temporary runtime artifacts under .temp/. Add this to your .gitignore:

.temp/

Running CLI commands: Use npx to run commands with your locally installed version (recommended to avoid version drift):

npx elasticdash test
npx elasticdash dashboard

Alternatively, install globally if you prefer shorter commands:

npm install -g elasticdash-sdk
elasticdash test
elasticdash dashboard

Quick Start

1. Write a test file (my-flow.ai.test.ts):

import '../node_modules/elasticdash-sdk/dist/test-setup.js'
import { expect } from 'expect'

aiTest('checkout flow', async (ctx) => {
  await runCheckout(ctx)

  expect(ctx.trace).toHaveLLMStep({ model: 'gpt-4', contains: 'order confirmed' })
  expect(ctx.trace).toCallTool('chargeCard')
})

2. Run it:

npx elasticdash test              # discover all * *.ai.test.ts files
npx elasticdash test ./ai-tests   # discover in a specific directory
npx elasticdash run my-flow.ai.test.ts  # run a single file

3. Read the output:

  ✓ checkout flow (1.2s)
  ✗ refund flow (0.8s)
    → Expected tool "chargeCard" to be called, but no tool calls were recorded

2 passed
1 failed
Total: 3
Duration: 3.4s

Workflow export requirements (subprocess mode):

Export plain callable functions from ed_workflows.ts/js.
Use JSON-serializable inputs/outputs (object or array) so dashboard replay can pass args and read results.
Do not export framework-bound handlers directly (for example Next.js NextRequest/NextResponse route handlers) — use HTTP workflow mode instead.

Documentation

Core Concepts

Test Writing Guidelines — comprehensive guide to writing AI workflow tests
Test Matchers — all available matchers with examples
Tool Recording & Replay — automatic tool tracing and checkpoint-based replay

Advanced Features

Workflows Dashboard — interactive workflow browser, debugger, and fetching traces from Langfuse
Agent Mid-Trace Replay — resume long-running agents from any task
Deno Support — using ElasticDash SDK in Deno projects

Integration & Reference

Instrumentation Guide — how to write ed_tools.ts, ed_workflows.ts, and ed_agents.ts to connect your production code to ElasticDash
Integration Guide — step-by-step SDK integration reference (templates, patterns, decision trees)
Agent Coding Instructions — behavioral instructions for AI coding agents performing the integration
Langfuse Trace Structure — Langfuse span structure required for dashboard replay and tool-level diffing

Quick Reference

Test Globals

| Global | Description | |---|---| | aiTest(name, fn) | Register a test | | beforeAll(fn) | Run once before all tests in the file | | beforeEach(fn) | Run before every test in the file | | afterEach(fn) | Run after every test in the file (runs even if test fails) | | afterAll(fn) | Run once after all tests in the file |

Recording Trace Data

Automatic (recommended): Workflow code making real API calls to OpenAI, Anthropic, Gemini, Grok, Kimi, or AWS Bedrock is automatically intercepted and recorded.

Manual (for custom providers or mocks):

ctx.trace.recordLLMStep({
  model: 'gpt-4',
  prompt: 'What is the order status?',
  completion: 'The order has been confirmed.',
})

ctx.trace.recordToolCall({
  name: 'chargeCard',
  args: { amount: 99.99 },
})

ctx.trace.recordCustomStep({
  kind: 'rag',
  name: 'pokemon-search',
  payload: { query: 'pikachu' },
  result: { ids: [25] },
})

Common Matchers

// Assert LLM calls
expect(ctx.trace).toHaveLLMStep({ model: 'gpt-4' })
expect(ctx.trace).toHaveLLMStep({ promptContains: 'order status' })

// Assert tool calls
expect(ctx.trace).toCallTool('chargeCard')

// Semantic output matching (LLM-judged)
expect(ctx.trace).toMatchSemanticOutput('order confirmed')

// Custom steps (RAG, code, fixed)
expect(ctx.trace).toHaveCustomStep({ kind: 'rag', name: 'pokemon-search' })

→ See Test Matchers for complete documentation

Automatic AI & Tool Tracing

AI Interception

The runner automatically intercepts and records calls to:

Anthropic (api.anthropic.com)
OpenAI (api.openai.com)
Gemini (generativelanguage.googleapis.com)
Grok/xAI (api.x.ai)
Kimi/Moonshot (api.moonshot.ai)
AWS Bedrock (bedrock-runtime.<region>.amazonaws.com) — both InvokeModel/InvokeModelWithResponseStream and Converse/ConverseStream

No code changes needed — just run your workflow and assertions work automatically. Because these providers are auto-captured, most workflows do not need to wrap LLM calls with wrapAI. See Picking a wrapper below.

Note on Bedrock: The interceptor sits on globalThis.fetch, so any code that reaches Bedrock through fetch is auto-captured (browsers, Workers, Deno, thin REST wrappers, and SDKs that use undici/fetch under the hood). @aws-sdk/client-bedrock-runtime on Node uses its own HTTP signer and bypasses globalThis.fetch — wrap those calls with wrapAI({ provider: 'bedrock', model }) so events still get tagged and mocked rerun can match them. See AWS Bedrock below.

Picking a wrapper

The SDK exposes three wrappers that look similar but solve different problems. Pick by what your function actually does:

| Your function is… | Use | Why | |---|---|---| | Deterministic (REST call, DB query, file IO — no LLM inside) | edTool | Records as a tool event AND registers in the global tool registry so CLI run-tool, MCP run_tool, and dashboard rerun can find it by name. | | Exactly one LLM round-trip, AND you need prompt mocks, AI output mocks by name, OR the provider isn't auto-intercepted | wrapAI | Records as an ai event with token usage. Only wrapAI supports prompt rewriting (resolvePromptMock / resolveUserPromptMock) and named AI output mocks. | | An agent loop (LLM + inner tools, multiple round-trips) | edTool on the outer boundary | The inner LLM calls are auto-captured by the AI interceptor. Wrapping the outer agent with wrapAI would hide the inner detail. | | A direct single call to an auto-intercepted provider SDK (Anthropic / OpenAI / Gemini / Grok / Kimi / Bedrock via fetch) | No wrapper | The AI interceptor already records it as an ai event with token usage. |

wrapTool is the primitive that edTool builds on. Use wrapTool directly only when you specifically do not want registry registration — for example, wrapping an inline closure inside another function.

Tool Recording

Recommended: edTool wraps a tool function (recording its name, input, output, duration, and any streaming output) and registers it in a global tool registry so it can be invoked by name from the CLI (npx elasticdash run-tool <name>), the MCP run_tool, and dashboard rerun:

import { edTool } from 'elasticdash-sdk'
import { runSelectQuery } from './services/dataService'

export const dataService = edTool('dataService', async (input: { query: string }) => {
  return await runSelectQuery(input.query)
})

Same event shape as wrapTool (type: 'tool'), so the existing tool-mock pipeline (snapshot_mock_profile, mocked_tools_overrides, strict mode) works unchanged. defineTool is an exported alias of edTool.

Lower-level: wrapTool — same tracing behavior without the registry registration. Use this only when you have a specific reason to keep the tool unregistered (e.g., a closure created inside another function):

import { wrapTool } from 'elasticdash-sdk'

export const dataService = wrapTool('dataService', async (input: { query: string }) => {
  return await runSelectQuery(input.query)
})

Manual pattern (legacy): isolate tracing in the service .then/.catch path so tracing failures never block business logic:

import { runSelectQuery } from './services/dataService'

export const dataService = async (input: any) => {
  const { query } = input as { query: string }
  return await runSelectQuery(query)
    .then(async (res: any) => {
      try {
        const { recordToolCall } = await import('elasticdash-sdk')
        recordToolCall('dataService', input, res)
      } catch {
        // tracing must never block the main service path
      }
      return res
    })
    .catch(async (err: any) => {
      try {
        const { recordToolCall } = await import('elasticdash-sdk')
        recordToolCall('dataService', input, err)
      } catch {
        // tracing must never block the main service path
      }
      throw err
    })
}

In manual mode, always isolate tracing in a separate try/catch so trace logging errors cannot interrupt core service execution.

→ See Tool Recording & Replay for checkpoint-based replay and freezing

Agent-loop pattern

If your "tool" is actually an agent — a function that calls an LLM and may iterate through tool-use blocks — wrap the outer boundary with edTool, not wrapAI. The AI interceptor will auto-record each inner LLM call as a separate ai event nested under the trace:

import { edTool } from 'elasticdash-sdk'
import Anthropic from '@anthropic-ai/sdk'

const client = new Anthropic()

async function runSearchAgent(input: { query: string }) {
  // Agent loop: each iteration produces its own auto-recorded `ai` event
  while (true) {
    const res = await client.messages.create({
      model: 'claude-sonnet-4-5-20250929',
      max_tokens: 1024,
      messages: [/* ... */],
    })
    if (res.stop_reason === 'end_turn') return res
    // ... handle tool_use blocks, append tool_result, loop
  }
}

export const search = edTool('search', runSearchAgent)

Wrapping runSearchAgent with wrapAI instead would record one ai event covering the whole loop and hide the per-iteration calls. edTool keeps the agent visible as a single named, rerunnable, mockable boundary while leaving inner LLM detail intact for assertions and replay.

AI Call Recording

wrapAI wraps a single LLM call and records it as a type: 'ai' event with name, input, output, duration, and token usage (auto-detected for Anthropic, OpenAI, and Gemini SDK responses):

import { wrapAI } from 'elasticdash-sdk'
import Anthropic from '@anthropic-ai/sdk'

const client = new Anthropic()

export const callClaude = wrapAI('claude-sonnet-4-5', async (messages: Anthropic.MessageParam[]) => {
  return await client.messages.create({
    model: 'claude-sonnet-4-5-20250929',
    max_tokens: 1024,
    messages,
  })
})

AWS Bedrock

Bedrock is recognised by URL pattern (bedrock-runtime.<region>.amazonaws.com) and supports both API families: InvokeModel / InvokeModelWithResponseStream (including streaming via the binary application/vnd.amazon.eventstream format) and the unified Converse / ConverseStream. Model IDs, prompts, completions, and token usage are extracted automatically — including for cross-region inference profiles like us.anthropic.… or au.anthropic.….

If your code reaches Bedrock through globalThis.fetch (browsers, Cloudflare Workers, Deno, undici-based clients, or a thin REST wrapper), nothing else is required. The interceptor captures the call, records it as an ai event with token usage, freezes it during rerun_step, and replays it during rerun_workflow_mocked.

If your code uses @aws-sdk/client-bedrock-runtime on Node, the AWS SDK runs through its own HTTP signer and bypasses globalThis.fetch. Wrap the call with wrapAI so events still get tagged and mocked rerun can match them — the Converse response's { usage: { inputTokens, outputTokens } } shape is auto-extracted, and tagging provider with the underlying vendor (e.g. 'claude' for anthropic.* model IDs) means existing matchers like expect(trace).toHaveLLMStep({ provider: 'claude' }) work unchanged:

import { wrapAI } from 'elasticdash-sdk'
import { BedrockRuntimeClient, ConverseCommand } from '@aws-sdk/client-bedrock-runtime'

const bedrock = new BedrockRuntimeClient({ region: process.env.AWS_REGION ?? 'us-east-1' })

// Pick a modelId currently available in your region — the Bedrock catalog drifts.
const MODEL_ID = 'anthropic.claude-haiku-4-6-20260101-v1:0'

export const callClaudeOnBedrock = wrapAI(
  'claude-haiku-bedrock',
  async (input: { system?: string; messages: Array<{ role: 'user' | 'assistant'; content: string }> }) => {
    return await bedrock.send(new ConverseCommand({
      modelId: MODEL_ID,
      system: input.system ? [{ text: input.system }] : undefined,
      messages: input.messages.map((m) => ({ role: m.role, content: [{ text: m.content }] })),
      inferenceConfig: { maxTokens: 1024 },
    }))
  },
  { provider: 'bedrock', model: MODEL_ID },
)

Notes:

Credentials come from the standard AWS provider chain (env vars, shared credentials file, IAM role) — the SDK does not manage them.
Other vendors on Bedrock (Llama, Titan, Mistral, Cohere, AI21) use the same pattern. For Converse the response shape is identical across vendors. For raw InvokeModel, Anthropic gets first-class extraction; other vendors fall back to a best-effort outputText / generation / choices lookup.

Use `wrapAI` when

The function body is essentially one LLM round-trip, AND at least one of the following applies:

The provider is not auto-intercepted (anything outside Anthropic / OpenAI / Gemini / Grok / Kimi / Bedrock — e.g., Mistral direct, Cohere direct, local Ollama), or the SDK bypasses globalThis.fetch (notably @aws-sdk/client-bedrock-runtime on Node).
You want prompt mocks — system or user prompt rewriting via resolvePromptMock / resolveUserPromptMock keyed by the name you pass to wrapAI. This is exclusive to wrapAI.
You want AI output mocks keyed by a named step — e.g., mock the "router" call without mocking every call to the same model. resolveAIMock keys off the name argument.
You want one labelled boundary per logical step in the trace (e.g., "router", "summarizer") with token usage attributed to that label, distinct from the raw provider-level event.

Do NOT use `wrapAI` when

The function is an agent loop (LLM + inner tool calls, multiple round-trips). Use edTool on the outer boundary and let the AI interceptor record each inner LLM call. See Agent-loop pattern above.
The function is a direct single-call use of an auto-intercepted provider's SDK. The interceptor already records it as a type: 'ai' event with token usage — adding wrapAI only adds a redundant labelled wrapper.
The function does not call an LLM. Use edTool.

AI mocking (subprocess / test runner mode): wrapAI also checks resolveAIMock at call time, so the dashboard can mock LLM responses the same way it mocks tool calls — without modifying your server code. Configure an AIMockConfig in the dashboard UI or pass it programmatically via the aiMockConfig option when running a workflow.

HTTP Streaming Capture and Replay

ElasticDash also captures non-AI fetch responses that stream over HTTP (for example SSE and NDJSON endpoints) in the HTTP interceptor.

Currently detected as streaming when response content-type includes:

text/event-stream
application/x-ndjson
application/stream+json
application/jsonl

How it behaves today:

During live execution, ElasticDash tees the response stream and returns a real stream to your app code.
In parallel, ElasticDash buffers the recorder side of the stream as raw text for trace replay.
During replay, ElasticDash reconstructs a stream from that captured raw payload and restores status, status text, and response headers.

Replay fidelity note:

Replay preserves stream payload content, but not original chunk boundaries or timing cadence.

Minimal stream consumption example:

const res = await fetch('https://example.com/events')
if (!res.body) throw new Error('Expected a streaming response body')

const reader = res.body.getReader()
const decoder = new TextDecoder()
let buffer = ''

for (;;) {
  const { done, value } = await reader.read()
  if (done) break
  buffer += decoder.decode(value, { stream: true })
}

buffer += decoder.decode()

→ See Quick Start Guide for end-to-end setup guidance

Workflow Tracing

wrapTool and wrapAI record individual steps, but to group them into a single named workflow trace you need the trace lifecycle: call edStartTrace at the start of your handler and edEndTrace when it finishes. Every wrapTool/wrapAI call in between is automatically associated with that trace.

1. Create `ed_workflows.ts`

This file exports the trace lifecycle functions. It holds a reference to the elasticdash-sdk module (set from ed_tools.ts) so it can call startTrace/endTrace:

// ed_workflows.ts
let _ed: any = null;

/** Called from ed_tools.ts to share the SDK module instance. */
export function setElasticDashModule(mod: any): void {
  _ed = mod;
}

/** Call at the start of your request handler to begin a named trace. */
export const edStartTrace = async (workflowName: string): Promise<void> => {
  if (!_ed) return;
  try {
    await _ed.tryAutoInitHttpContext();
    _ed.startTrace(workflowName);
  } catch (err) {
    console.error('[ed_workflows] edStartTrace error:', err);
  }
};

/** Call in a finally block to end the trace and flush captured events. */
export const edEndTrace = (): void => {
  if (!_ed) return;
  try {
    _ed.endTrace();
  } catch (err) {
    console.error('[ed_workflows] edEndTrace error:', err);
  }
};

2. Create `ed_tools.ts`

This file loads the SDK, shares the module instance with ed_workflows.ts, and exports your wrapped tools:

// ed_tools.ts
import { createRequire } from 'node:module';
import { setElasticDashModule } from './ed_workflows';

let edTool: <T extends (...args: any[]) => any>(name: string, fn: T) => T = (_name, fn) => fn;

// `createRequire(import.meta.url)` works in BOTH ESM (`"type": "module"`)
// and CJS projects. Do NOT use `eval('require')` — it silently throws in
// ESM and the whole integration produces zero traces with zero logs.
const nodeRequire = createRequire(import.meta.url);

try {
  const _edModule = nodeRequire('elasticdash-sdk');
  edTool = _edModule.edTool ?? _edModule.wrapTool ?? edTool;
  setElasticDashModule(_edModule);
} catch (err) {
  console.error('[ed_tools] failed to load elasticdash-sdk:', err);
}

export const myTool = edTool('myTool', async (input: { query: string }) => {
  // ... your tool logic
});

Why setElasticDashModule? The SDK uses Node.js AsyncLocalStorage (ALS) to correlate events. Both ed_tools.ts and ed_workflows.ts must share the same CJS module instance so wrapTool/wrapAI calls write to the same ALS store that startTrace/endTrace reads from.

3. Instrument Your Route Handler

Call edStartTrace at handler entry and edEndTrace in a finally block. All wrapTool/wrapAI calls in between are grouped under the trace:

// app/api/chat/route.ts (Next.js example)
import { edStartTrace, edEndTrace } from './ed_workflows';
import { myTool } from './ed_tools';

export async function POST(req: Request) {
  await edStartTrace('chatHandler');
  try {
    const body = await req.json();
    const result = await myTool({ query: body.message });
    return Response.json(result);
  } finally {
    edEndTrace();
  }
}

4. Capture Traces to Disk

Set the ELASTICDASH_CAPTURE_TRACE environment variable to save each workflow trace as a JSON file under .ed_traces/:

ELASTICDASH_CAPTURE_TRACE=1 npm run dev

Captured traces can be used for offline replay, benchmark tests (defineTest in ed_tests.ts), and debugging in the dashboard.

HTTP Workflow Mode

For apps where subprocess import fails (Next.js, Remix, SvelteKit, etc.), configure workflows to call your running dev server directly instead of importing the handler:

// elasticdash.config.ts
export default {
  testMatch: ['**/*.ai.test.ts'],
  workflows: {
    runChat: {
      mode: 'http',
      url: 'http://localhost:3001/api/chat',
      method: 'POST',
      headers: {
        'Content-Type': 'application/json',
        'x-user-id': '{{env.DEV_USER_ID}}',
      },
      bodyTemplate: {
        messages: [{ role: 'user', content: '{{input.message}}' }],
        selectedModel: 'claude-sonnet-4-5-20250929',
      },
      responseFormat: 'vercel-ai-stream',
    },
  },
}

To enable full AI and tool call observability in HTTP mode, install elasticdash-sdk in your app:

// app/api/chat/route.ts
import { initHttpRunContext, wrapTool, wrapAI } from 'elasticdash-sdk'

export async function POST(req: Request) {
  const runId = req.headers.get('x-elasticdash-run-id')
  const serverUrl = req.headers.get('x-elasticdash-server')
  if (runId && serverUrl) {
    await initHttpRunContext(runId, serverUrl)
  }
  // ... rest of handler
}

The dashboard injects x-elasticdash-run-id and x-elasticdash-server headers automatically when triggering a run. initHttpRunContext fetches any frozen steps from the dashboard before execution begins — this is what enables step freezing (replaying historical results for specific steps). Every wrapAI and wrapTool call downstream pushes telemetry events back to the dashboard in real time.

Note: Use setHttpRunContext (synchronous) if you only need observability and do not need step freezing. initHttpRunContext is required for the dashboard's breakpoint/replay functionality to work.

Dashboard Auto-Detection (env var mode)

As an alternative to calling initHttpRunContext in your request handler, you can set two environment variables before starting your server or script. Every wrapTool and wrapAI call will then connect to the dashboard automatically — no code changes needed:

# Required: URL of the running ElasticDash dashboard
ELASTICDASH_SERVER=http://localhost:4573

# Optional: pre-registered run ID to fetch frozen steps for
ELASTICDASH_RUN_ID=<run-id-from-dashboard>

If only ELASTICDASH_SERVER is set, a fresh run ID is generated and all calls push live telemetry to the dashboard (observability only, no step freezing).
If both variables are set, frozen steps are fetched from the dashboard at startup and replayed as configured.
If the dashboard is unreachable the SDK falls through to live execution silently.
The initialization runs once per process — subsequent wrapTool/wrapAI calls reuse the cached context.

This mode is intended for local development and testing scenarios. For production HTTP servers with concurrent requests, continue using initHttpRunContext inside your request handler.

Subprocess vs HTTP mode comparison:

| | Subprocess (default) | HTTP mode | |---|---|---| | Works with simple apps | Yes | Yes | | Works with Next.js / Remix | No | Yes | | Requires dev server running | No | Yes | | App code changes needed | Extract handler to ed_workflows.ts | Add initHttpRunContext to request handler (or use env vars for auto-detect) | | AI / tool call observability | Automatic via interceptors | Via wrapAI / wrapTool push | | Step freezing / breakpoints | Yes | Yes (initHttpRunContext, or ELASTICDASH_SERVER + ELASTICDASH_RUN_ID env vars) | | LLM response mocking | Yes (via aiMockConfig) | Yes (via frozen AI events) |

CI/CD Runner

Run your ElasticDash test groups directly from CI pipelines. The ci command fetches active test groups from your project via API key, executes each test locally, submits results back to the backend, and exits with code 1 if any test fails.

How It Works

┌──────────────┐     GET /testgroups/by-project     ┌──────────────────┐
│   CI Runner  │ ──────────────────────────────────→ │  ElasticDash API │
│  (SDK side)  │ ←────────────────────────────────── │   (your backend) │
│              │   test groups + tests + expectations │                  │
│              │                                      │                  │
│  execute     │     POST /testgroups/:id/runs        │                  │
│  each test   │ ──────────────────────────────────→ │  stores results  │
│  locally     │                                      │                  │
│              │     POST /testgroups/batches          │                  │
│              │ ──────────────────────────────────→ │  groups the runs │
└──────────────┘                                      └──────────────────┘

Fetch — Calls GET /testgroups/by-project with the API key (scoped to project). Returns all active test groups with their tests and expectations.
Execute — For each test, runs it locally using existing SDK infrastructure:
- Single-step tests — replays a specific tool or AI step with mock_input and frozen_events
- Full-flow tests — runs the entire workflow from ed_workflows.ts with workflow_input
Evaluate — Checks all expectations (token-budget, latency-budget, output-contains, output-schema, tool-called, determinism, llm-judge). Respects run_count and pass_threshold.
Submit — POSTs each result to POST /testgroups/:id/runs with single run data, expectation results, and git metadata.
Batch — Creates a batch grouping all run IDs for dashboard viewing.

CLI Usage

# Basic — uses env vars (set in .env or CI secrets)
npx elasticdash ci

# Explicit flags (if not using env vars)
npx elasticdash ci --server https://server.elasticdash.com --api-key ed_xxx

# Filter by workflow or tags
npx elasticdash ci --server $ELASTICDASH_API_URL --api-key $ELASTICDASH_API_KEY \
  --workflow checkout --tags payment,critical

# Pass git metadata (auto-detected in GitHub Actions / GitLab CI)
npx elasticdash ci --server $ELASTICDASH_API_URL --api-key $ELASTICDASH_API_KEY \
  --git-branch main --git-commit abc123

All flags:

| Flag | Env Var | Description | |------|---------|-------------| | --server <url> | ELASTICDASH_API_URL | Backend API URL (required) | | --api-key <key> | ELASTICDASH_API_KEY | Project API key (required) | | --workflow <name> | — | Filter test groups by workflow name | | --tags <t1,t2> | — | Filter test groups by tags (comma-separated) | | --triggered-by <src> | — | Trigger source label (default: ci) | | --git-branch <branch> | Auto-detected | Git branch name | | --git-commit <sha> | Auto-detected | Git commit SHA | | --git-commit-message <msg> | Auto-detected | Commit message | | --git-pr-number <n> | Auto-detected | PR number | | --git-pr-url <url> | Auto-detected | PR URL |

GitHub Actions Example

name: AI Tests
on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: 20

      - run: npm ci

      - name: Run ElasticDash CI tests
        run: npx elasticdash ci
        env:
          ELASTICDASH_API_URL: ${{ secrets.ELASTICDASH_API_URL }}
          ELASTICDASH_API_KEY: ${{ secrets.ELASTICDASH_API_KEY }}
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}  # if tests use OpenAI

Git branch, commit SHA, PR number, and PR URL are auto-detected from GitHub Actions environment variables — no extra flags needed.

GitLab CI Example

ai-tests:
  stage: test
  image: node:20
  script:
    - npm ci
    - npx elasticdash ci
  variables:
    ELASTICDASH_API_URL: $ELASTICDASH_API_URL
    ELASTICDASH_API_KEY: $ELASTICDASH_API_KEY

Programmatic Usage

import { runCI } from 'elasticdash-sdk'

const summary = await runCI({
  serverUrl: 'https://your-api.com',
  apiKey: 'ed_xxx',
  workflowName: 'checkout',       // optional filter
  tags: ['payment', 'critical'],  // optional filter
})

console.log(`${summary.passed}/${summary.total} passed`)
process.exit(summary.failed > 0 ? 1 : 0)

Output

[elasticdash ci] Fetching test groups...
[elasticdash ci] Found 2 test group(s), 5 test(s) total.

  Checkout Flow (3 tests)
    validate-input ... PASS (234ms)
    charge-card ... PASS (1823ms)
    send-confirmation ... FAIL (945ms)
      [output-contains] Output text check failed.

  Refund Flow (2 tests)
    check-eligibility ... PASS (412ms)
    process-refund ... PASS (1567ms)

──────────────────────────────────────────────────
Summary
──────────────────────────────────────────────────
  Total:    5
  Passed:   4
  Failed:   1
  Duration: 5.0s
  Batch ID: 42
──────────────────────────────────────────────────

[elasticdash ci] 1 test(s) failed.

Prerequisites

An ElasticDash project with an API key (create one in the dashboard under Settings → API Keys)
Active test groups with tests and expectations configured in the dashboard
ed_tools.ts and/or ed_workflows.ts in your project root (for the executor to discover tools and workflows)
AI provider API keys in the environment if tests use LLM calls (e.g., OPENAI_API_KEY)

Configuration

Optional elasticdash.config.ts at project root:

export default {
  testMatch: ['**/*.ai.test.ts'],
  traceMode: 'local' as const,
}

Dashboard port: defaults to 4573. Override via CLI flag or .env:

# .env
ELASTICDASH_PORT=5000

# or CLI flag
npx elasticdash dashboard --port 5000

Optional project file: ed_workers.ts can be used by your app architecture (for example, exporting worker handlers), but it is not required or discovered by the ElasticDash CLI/dashboard.

Debugging reruns

Workflow and tool reruns each run in an isolated subprocess. When a rerun hangs, runs unexpectedly slow, or fails with an opaque error, set these environment variables to surface what the parent and the worker are doing:

| Variable | Default | Effect | |---|---|---| | ELASTICDASH_DEBUG | unset | When 1, parent and worker emit stage breadcrumbs to stderr (stage=spawned, stage=payload-written, stage=first-stdout, stage=workflow-call-start/end, stage=closed, etc.) with pid and elapsedMs. | | ELASTICDASH_HEARTBEAT_MS | 5000 | Interval (ms) for the parent to log still running pid=… elapsedMs=… while a subprocess is alive. Set 0 to disable. Only emitted when ELASTICDASH_DEBUG=1. | | ELASTICDASH_TOOL_TIMEOUT_MS | unset (no timeout) | When set, the parent kills the tool subprocess after N ms (SIGTERM, then SIGKILL after a 2s grace) and surfaces Tool subprocess timed out after Nms with the child's exit code, signal, and last stderr. | | ELASTICDASH_WORKFLOW_TIMEOUT_MS | unset (no timeout) | Same as above for the workflow subprocess. |

On failure, the parent's error string now always includes [exit=… signal=… elapsedMs=… pid=… stderrBytes=…] plus the last 1 KB of stderr — so an empty-output failure is no longer indistinguishable from a crash or signal kill.

Example:

ELASTICDASH_DEBUG=1 ELASTICDASH_HEARTBEAT_MS=2000 ELASTICDASH_TOOL_TIMEOUT_MS=30000 \
  npx elasticdash dashboard

TypeScript Setup

For typed globals and matchers, extend your test directory's tsconfig.json:

{
  "extends": "../tsconfig.json",
  "include": ["../src/**/*", "./**/*"]
}

Programmatic API

import { runFiles, reportResults, registerMatchers, installAIInterceptor } from 'elasticdash-sdk'

registerMatchers()
installAIInterceptor()

const results = await runFiles(['./tests/flow.ai.test.ts'])
reportResults(results)

HTTP mode context (call inside your request handler):

import { initHttpRunContext, setHttpRunContext } from 'elasticdash-sdk'

// Async — fetches frozen steps from dashboard to enable step freezing/breakpoints
await initHttpRunContext(runId, dashboardUrl)

// Synchronous alternative — observability only, no step freezing
setHttpRunContext(runId, dashboardUrl)

Dashboard auto-detection (env var mode — no code changes needed):

# Set before starting your server or script
ELASTICDASH_API_URL=https://server.elasticdash.com  # cloud (or http://localhost:4573 for local dashboard)
ELASTICDASH_API_KEY=ed_your_api_key_here             # your project API key
ELASTICDASH_RUN_ID=<run-id-from-dashboard>           # optional, enables step freezing

wrapTool and wrapAI will auto-connect on their first call. See Dashboard Auto-Detection for details.

CI runner (execute test groups from your project):

import { runCI } from 'elasticdash-sdk'

const summary = await runCI({ serverUrl: 'https://server.elasticdash.com', apiKey: 'ed_xxx' })
// summary.total, summary.passed, summary.failed, summary.batchId, summary.results

License

MIT