@elench/testkit

v0.1.143

Published

14 hours ago

Assistant-first CLI for running, inspecting, and debugging local testkit suites

Downloads

13,928

0High
0Medium
0Low

harrydb

gfdlr

@elench/testkit

@elench/testkit discovers *.testkit.ts files, infers suite ownership from the filesystem, starts local services, provisions Docker-managed local Postgres databases, and runs test suites.

The package is now driven by testkit.config.ts, not testkit.config.json.

Usage

cd my-product

# Launch the interactive assistant
npx @elench/testkit
npx @elench/testkit assistant --provider codex --model gpt-5.4
npx @elench/testkit assistant --provider claude --model sonnet --effort high

# Ask for one assistant turn non-interactively
npx @elench/testkit assistant --message "Show the latest Testkit status"
npx @elench/testkit assistant --message "Run the e2e tests for the api service"
npx @elench/testkit assistant --message "Why did the latest failure happen?"

# Run every testkit-managed suite in batch mode
npx @elench/testkit run

# Inspect discovered tests without running them
npx @elench/testkit discover
npx @elench/testkit discover --output-mode verbose
npx @elench/testkit discover --json > .testkit/discovery.json
npx @elench/testkit discover --output .testkit/discovery.json

# Filter by type
npx @elench/testkit --type int
npx @elench/testkit --type dal
npx @elench/testkit --type e2e
npx @elench/testkit --type int,e2e,dal
npx @elench/testkit --type ui

# Parallel file execution
npx @elench/testkit --workers 8

# One file-level wall clock budget for every suite file
npx @elench/testkit --file-timeout-seconds 60

# Specific service / suite
npx @elench/testkit --service frontend --type ui -s navigation
npx @elench/testkit --service api --type int -s health
npx @elench/testkit --type int,e2e,dal -s dal:queries

# Exact file
npx @elench/testkit --type int --file __testkit__/health/health.int.testkit.ts

# Temporarily ignore repo-declared skip rules
npx @elench/testkit --ignore-skip-rules --file __testkit__/billing/billing.int.testkit.ts

# Deterministic git-trackable status snapshot
npx @elench/testkit --type int --write-status

# Lifecycle
npx @elench/testkit status
npx @elench/testkit destroy
npx @elench/testkit cleanup

# Local production environment
npx @elench/testkit local up
npx @elench/testkit local status
npx @elench/testkit local env --service frontend
npx @elench/testkit local logs --service frontend
npx @elench/testkit local down

# Inspect the latest run artifact through the assistant
npx @elench/testkit assistant --message '/inspect "__testkit__/health/health.int.testkit.ts"'
npx @elench/testkit assistant --message '/artifacts "__testkit__/health/health.int.testkit.ts"'
npx @elench/testkit assistant --message "/logs api"

# Automatic regression intelligence
# Configure testkit.regressions.json and testkit classifies new vs known regressions automatically during runs

# Diagnostics: refresh or verify the source schema cache
npx @elench/testkit db schema refresh --service api
npx @elench/testkit db schema verify --service api

testkit is assistant-first in an interactive TTY. The interactive assistant opens with a repo-aware landing panel: provider/model, current directory, latest run result, focused file/service, regression counts, and suggested next prompts. The bottom composer is the primary interaction surface, and the status line shows approximate context remaining when the active provider/model window is known, for example [~96% remaining].

Natural-language turns still go through Codex or Claude, but testkit owns the transcript, command-observation state, context files under .testkit/assistant/, and rendering around testkit, npm, and npx commands. Providers own command choice, edits, retries, and recovery. When the provider runs testkit-managed commands, the assistant observes real Testkit command executions through command sidecars and run artifacts, then refreshes the latest run state so follow-up questions can use the new result immediately.

Assistant transcript content has an explicit rendering contract. Provider and user prose is stored as raw Markdown and parsed only by the TUI renderer. System notices stay plain text. Observed commands and run sessions stay structured, so Testkit renders real command state and run artifacts without parsing provider prose for control flow.

Assistant provider coverage is tested against the real codex and claude CLIs. The test suite assumes both are installed and authenticated; provider adapter, assistant shell, command-observation, and real testkit-run coverage do not use provider stand-in binaries or simulated provider sessions.

Assistant runtime settings are repo-local. Use /provider, /model, /effort, and /settings inside the assistant to inspect or change the active provider runtime; changes are persisted to .testkit/assistant/settings.json. CLI flags such as --provider, --model, --effort, and repeatable --provider-arg override those settings for the current launch. The composer has an always-visible cursor and supports arrow keys, Home/End, Ctrl+A/Ctrl+E, Backspace, Delete, Ctrl+D, and Ctrl+L to clear the visible transcript. Ctrl+C quits the assistant.

The non-interactive assistant --message ... mode uses the same provider command-observation path for one hosted turn at a time. It is useful in scripts and tests, but it is not the primary interactive UX.

Batch run output stays intentionally short: one line per completed file, a concise failure block, and a final summary. Service logs, captured runtime output, emitted artifacts, and assistant-visible run state are persisted under .testkit/results/.

testkit discover also maintains a small durable per-test history index at .testkit/history/tests.json. The index tracks first/last seen timestamps, run counts, pass/fail/skip counts, average duration, and last observed status, and those summaries are exposed in compact, verbose, and JSON discovery output.

Test execution also maintains a scheduler cache at .testkit/timings.json. Completed file-level task durations are used to rank future runs with a longest-estimated-duration-first policy, so slow files start earlier when workers are available. Run artifacts include compact scheduler metadata under planning so ordering decisions are inspectable.

Automatic Regression Diagnosis

If regressions.file is configured, every run automatically classifies observed results without any separate follow-up maintenance command.

testkit distinguishes four user-facing outcomes:

new regressions
known regressions
fixed known regressions
catalog stale

The default CLI keeps those signals lightweight:

failed files print inline diagnosis immediately under the file line
the final summary box reports aggregate regression counts only
machine-readable artifacts gain per-file diagnosis plus top-level regressions.summary, regressions.catalog, and prepared regressions.drafts

catalog stale is repo hygiene, not a test failure. It means the regression catalog or linked issue tracker metadata needs attention, for example because a linked issue is closed but the regression still reproduces.

Tooling Adapters

testkit also ships tool-specific config helpers so consumer repos do not need repo-local runtime policy files just to cooperate with managed runs.

// drizzle.config.ts
import { defineConfig } from "@elench/testkit/drizzle";

export default defineConfig({
  schema: "./src/db/schema/index.ts",
  out: "./src/db/migrations",
  dialect: "postgresql",
  dbCredentials: {
    url: process.env.DATABASE_URL!,
  },
});

// vitest.config.ts
import { defineConfig } from "@elench/testkit/vitest";

export default defineConfig({
  test: {
    include: ["src/**/*.test.ts"],
  },
});

// ui.config.ts
import { defineConfig, devices } from "@elench/testkit/ui";

export default defineConfig({
  testDir: "./__testkit__",
  projects: [{ name: "chromium", use: { ...devices["Desktop Chrome"] } }],
}, {
  dotenvFiles: [".env.local"],
});

For scripts and app-runtime code, @elench/testkit/env provides shared helpers for managed runtime detection, dotenv loading, and local-database safety:

import {
  assertLocalDatabaseUrl,
  loadDotenvFiles,
  shouldLoadDotenv,
} from "@elench/testkit/env";

if (shouldLoadDotenv()) {
  loadDotenvFiles({ files: [".env", ".env.local"] });
}

assertLocalDatabaseUrl(process.env, "seed.ts");

Setup

Create testkit.config.ts at repo root:

import {
  app,
  database,
  defineConfig,
  defineFile,
  environment,
  toolchain,
} from "@elench/testkit/config";

export default defineConfig({
  execution: {
    workers: 8,
    fileTimeoutSeconds: 60,
  },
  regressions: {
    file: "testkit.regressions.json",
    sync: {
      provider: "github",
      mode: "warn",
      cacheTtlSeconds: 900,
    },
  },
  fingerprints: {
    exclude: ["next-env.d.ts"],
  },
  toolchains: {
    frontendNode: toolchain.node({
      cwd: "frontend",
      detect: "auto",
      install: "download",
    }),
  },
  environments: {
    local: environment.local({
      target: "frontend",
      data: "reuse",
    }),
  },
  services: {
    api: app.node({
      cwd: ".",
      entry: "src/index.ts",
      port: 3004,
      envFiles: [".env.testkit"],
      database: database.postgres({
        sourceSchema: database.schema.fromEnv("PRODUCTION_DATABASE_URL"),
        template: {
          inputs: ["db/schema.sql", "scripts/seed.ts"],
          migrate: [{ kind: "sql-file", path: "db/schema.sql" }],
          seed: [{ kind: "command", run: "npm run db:seed" }],
          verify: [{ kind: "module", target: "src/testkit/verify-seed.ts#verifySeed" }],
        },
      }),
      runtime: {
        instances: 1,
        maxConcurrentTasks: 4,
      },
    }),
    frontend: app.next({
      cwd: "frontend",
      mode: "start",
      port: 3000,
      dependsOn: ["api"],
      envFiles: ["frontend/.env.testkit"],
      env: {
        values: {
          NEXT_PUBLIC_API_URL: "{baseUrl:api}",
        },
      },
      runtime: {
        instances: 1,
        maxConcurrentTasks: 2,
        toolchain: "frontendNode",
      },
    }),
  },
});

File-local execution metadata now lives next to the test when possible:

import { defineFile } from "@elench/testkit/config";

export const testkit = defineFile({
  skip: "Billing is currently unavailable locally",
  locks: ["global-worker-loop"],
});

testkit.config.ts is optional for simple repos, but it is the primary escape hatch for:

worker count and per-file runtime budget
per-file wall clock timeout budget
multi-service graphs
local runtime instance counts
per-runtime concurrent task caps
repo-managed Node toolchains for prepare/start commands
one-time runtime preparation steps for stable shared servers
local DB binding configuration
source-backed schema verification
template database migrate / seed / verify stages
explicit per-file or per-suite locks
named HTTP suite profiles
automatic regression classification for new vs known failures
optional GitHub-backed regression issue sync
repo-level fingerprint include/exclude policy
repo-declared suite/file skip policies with explicit reasons
telemetry upload configuration

runtime.prepare is the generic build-once hook for shared runtimes. It runs once per runtime generation before local services start, fingerprints declared inputs, and writes cache state under the service runtime directory. This is the right way to move expensive browser targets from next dev / watch mode to stable build-and-start flows.

testkit local starts the same service graph as a persistent local production environment instead of a test run. It provisions local databases, runs template setup, runs runtime.prepare, starts dependent services, and records state under .testkit/environments/<name>/ rather than .testkit/_runs. Local environment processes receive TESTKIT_ACTIVE=1, TESTKIT_MODE=local, and TESTKIT_LOCAL_ENV=<name>. Use data: "reuse" for fast restarts against the existing local runtime database, data: "reset" to refresh runtime databases from their templates on each launch, or --rebuild to destroy and recreate the environment state. Testkit only supports local environments here; it does not copy production data and it refuses managed runtime database URLs that are not loopback PostgreSQL URLs.

database.template is the database-side equivalent for reusable template DB state. When database.sourceSchema is configured, Testkit treats the configured source database as the schema source of truth. A normal testkit run resolves a commit-aware source schema cache under .testkit/db/<service>/source-schemas/, applies that cached schema to the local template DB, runs local template setup, and verifies that the replayed local schema still matches the source dump. If local replay differs, Testkit refreshes from the source once for the current cache key and retries. If it still differs, the run fails with schema diagnostics under .testkit/results/schema.

Source schema cache keys are derived automatically from repo state:

clean git worktrees use commits/<sha>
dirty git worktrees use dirty/<sha>-<fingerprint>
non-git directories use nogit/<fingerprint>

Dirty worktree fingerprints use Git's own ignore engine for untracked files, so normal .gitignore rules are respected. Testkit-owned outputs are always excluded from fingerprints: .testkit/, .next-testkit/, and testkit.status.json. testkit.config.ts is normal repo configuration and is not excluded automatically. For user-managed generated files, add repo-level fingerprints.exclude patterns; fingerprints.include can opt specific paths back in below an excluded directory.

Branch names and worktree paths are recorded as metadata but do not affect clean commit cache keys, so branch renames and clean worktrees at the same commit reuse the same source schema. Dirty worktrees are isolated by content fingerprint so local experiments cannot overwrite a clean commit baseline.

Template setup executes in three explicit phases:

migrate
seed
verify

Schema drift is checked after each successful migrate and seed step, and verify only runs once local replay matches the source schema. Source refreshes use pg_dump --schema-only --no-owner --no-privileges, so seed/reference data is never written into the baseline. Keep schema-changing setup in its own step where possible; a single command that changes schema and then fails before exiting cannot be refreshed at the midpoint.

Source schema refreshes are intentionally single-connection and pooler-safe. If a Neon pooled source URL is configured, Testkit rewrites it to the matching direct Neon endpoint before running pg_dump and records the original/resolved host classifications beside the resolved cache entry. Unknown PgBouncer/pooler URLs fail closed; configure a direct source URL for those providers. Concurrent refreshes for the same service and cache key are serialized with a cache-local lock so multiple Testkit processes do not stampede the source database. Testkit also maintains .testkit/db/<service>/source-schemas/index.json and prunes old inactive cache entries automatically.

For most repos, prefer declarative step objects directly inside database.postgres({ template: ... }) and runtime.prepare.steps. The supported shapes are:

{ kind: "command", run: "..." }
{ kind: "sql-file", path: "..." }
{ kind: "module", target: "file.ts#exportName" }

runtime.toolchain is the first-class way to make those prepare/start commands run under the correct Node toolchain instead of whatever node/npm happened to launch testkit. Node toolchains support:

host verification mode: install: "require-host"
cached repo-local provisioning mode: install: "download"
auto-detection from:
- package.json#volta.node
- .nvmrc
- .node-version
- .tool-versions (nodejs)
- package.json#engines.node
- package.json#volta.npm
- package.json#packageManager
- package.json#engines.npm

Example:

toolchains: {
  frontendNode: toolchain.node({
    cwd: "frontend",
    detect: "auto",
    install: "download",
  }),
},
services: {
  frontend: app.next({
    cwd: "frontend",
    port: 3000,
    runtime: {
      toolchain: "frontendNode",
    },
  }),
}

If regressions.file is configured, testkit enriches .testkit/results/latest.json and testkit.status.json with:

per-file failureDetails
per-file diagnosis metadata (new regression, known regression, fixed known regression)
top-level regressions summary and prepared draft updates

Regression-catalog entry authoring uses this contract:

summary
- concise local statement of the regression slice
cause
- underlying technical cause of the failure
fingerprints
- selectors that let testkit automatically recognize the regression in future runs

If regressions.sync is also configured, testkit syncs linked GitHub issues and adds top-level regression catalog health to the run/status artifacts. The most important catalog-staleness signal is:

a known regression still fails, but the linked GitHub issue is closed

In mode: "error", catalog health can also fail the run for problems such as:

closed issues that still reproduce
missing issue refs
validation unavailability

Reproduction warnings are execution-aware:

failed means the known regression reproduced
passed means the matched test executed and did not reproduce
skipped and not_run do not count as reproduction evidence

Authoring

HTTP suites:

import { defineHttpSuite } from "@elench/testkit";
import { expect } from "@elench/testkit/runtime";

const suite = defineHttpSuite(({ rawReq }) => {
  const response = rawReq.get("/health");
  expect.status(response, 200, "health returns 200");
});

export default suite;

testkit suite files should default-export the suite object returned by defineHttpSuite(...) or defineDalSuite(...).

Named HTTP profiles live in testkit.config.ts and can be referenced by name:

import { defineHttpSuite } from "@elench/testkit";
import { auth, defineConfig } from "@elench/testkit/config";

const appAuth = auth.fixture({
  contract: auth.contracts.jsonSession({
    authCookie: "session",
    organizationIdPath: "data.organizations[0].id",
  }),
  topology: auth.topologies.crossOrg({
    namespace: "example-app",
    actors: {
      primary: { org: "primary" },
      reviewer: { org: "primary" },
      outsider: { org: "secondary" },
    },
  }),
});

export default defineConfig({
  profiles: {
    http: appAuth.profiles({
      defaultAuth: auth.profile.actor("primary"),
      reviewers: auth.profile.actors({
        actors: ["reviewer", "outsider"],
        primaryActor: "reviewer",
      }),
      raw: auth.profile.raw(),
    }),
  },
});

const suite = defineHttpSuite({ profile: "defaultAuth" }, ({ actor, actors, req }) => {
  req.get("/api/auth/session");
  actor?.req.get("/api/auth/session");
  req.as("outsider").get("/api/auth/session");
  actors.get("reviewer").rawReq.get("/api/auth/session");
});

DAL suites:

import { defineDalFixtures, defineDalSuite } from "@elench/testkit";

const fixtures = defineDalFixtures(({ db, fixtureScope }) => ({
  widget() {
    return fixtureScope.seed("widget", "primary", { name: "Primary Widget" }, () => {
      const widgetId = fixtureScope.id("widget");
      db.exec(`
        INSERT INTO widgets (id, name)
        VALUES ('${widgetId}', 'Primary Widget')
      `);
      return { widgetId };
    });
  },
}));

const suite = defineDalSuite({ fixtures }, ({ db, fixtureScope, fixtures }) => {
  const widget = fixtures.widget();
  db.query(`select id from widgets where id = '${widget.widgetId}'`);
  fixtureScope.records();
});

export default suite;

defineDalFixtures(...) is the package-owned DAL seeding model. It gives every suite a deterministic fixtureScope with:

fixtureScope.id(label) / uuid(label)
fixtureScope.slug(label)
fixtureScope.email(label)
fixtureScope.string(label, options)
fixtureScope.token(label)
fixtureScope.seed(kind, key, signature, create)
fixtureScope.records()

testkit enforces strict fixture behavior:

one logical fixture per kind + key
identical reseeds reuse the same seeded value
conflicting reseeds fail immediately
dependency cycles fail immediately
seeded fixture records are persisted as a testkit.dal-fixtures artifact

Scenario suites:

import { defineScenarioSuite } from "@elench/testkit";

const suite = defineScenarioSuite(({ rawReq, scenario }) => {
  const plan = scenario.choose("journey", {
    endpoint: scenario.pick("endpoint", ["/health", "/message"]),
    includeHealthCheck: scenario.maybe("includeHealthCheck", 1),
  });

  const selected = scenario.resource("selected-endpoint", () => rawReq.get(plan.endpoint));

  scenario.step("fetch selected endpoint", () => {
    selected.get();
  });
});

export default suite;

First-class runtime namespaces:

import { checks, expect, parse, waitFor } from "@elench/testkit/runtime";

const suite = defineHttpSuite(({ actor, rawReq, req }) => {
  const me = req.get("/api/v1/me");
  expect.status(me, 200, "authenticated session loads");
  expect.json(me, (body) => typeof body.data?.id === "string", "response includes user id");

  checks.authGate(rawReq, "sessions", {
    get: ["/api/v1/sessions"],
  });

  actor?.rawHeaders();

  const upload = req.multipart.post("/api/v1/uploads", {
    fields: { name: "fixture" },
    files: [{ field: "file", data: "hello", filename: "fixture.txt", contentType: "text/plain" }],
  });

  expect.status(upload, 201, "upload is accepted");
  parse.safeJson(upload);
});

Low-level runtime primitives still exist when you genuinely need them:

import { check, group, http } from "@elench/testkit/runtime";

waitFor() consumes the file budget configured by execution.fileTimeoutSeconds. Consumers should not set local timeout values in test files.

import { parse, waitFor } from "@elench/testkit/runtime";

const response = waitFor(
  () => req.get("/api/v1/jobs/123"),
  (res) => parse.json(res).data?.status === "completed",
  { description: "job 123 to complete" }
);

Discovery

testkit discovers suites from __testkit__/ directories.

Example layouts:

src/api/routes/__testkit__/auth/me.int.testkit.ts
src/db/__testkit__/sessions/count-type.dal.testkit.ts
frontend/__testkit__/navigation/navigation.ui.testkit.ts
src/internal/handler/__testkit__/repos/crud.int.testkit.ts

testkit uses these suffixes automatically:

*.int.testkit.ts
*.e2e.testkit.ts
*.scenario.testkit.ts
*.dal.testkit.ts
*.load.testkit.ts
*.ui.testkit.ts

See docs/test-types.md for the canonical type model.

Ownership is inferred from:

the deepest matching service root from services.<name>.local.cwd
optional services.<name>.discovery.roots overrides for shared-root edge cases

Suite names are inferred from the colocated path:

auth/__testkit__/*.int.testkit.ts => auth
routes/__testkit__/auth/*.int.testkit.ts => auth

Discovery is also a first-class CLI/API surface:

testkit discover
- human-first compact output with service -> type -> suite -> file hierarchy
testkit discover --output-mode verbose
- explicit paths, IDs, locks, dependencies, skip reasons, and history detail
testkit discover --json
- machine-readable output with stable enums, canonical paths, and summary data
testkit discover --output .testkit/discovery.json
- writes the same machine-readable JSON document to a file artifact

Compact mode prefers derived human labels such as Agent Configs Auth Gate instead of printing long file paths as the primary row label. Exact paths remain available in verbose and JSON output.

The public API is exported from @elench/testkit/discovery:

import { discoverTests } from "@elench/testkit/discovery";

const result = await discoverTests({
  dir: process.cwd(),
  runnableOnly: true,
  diagnostics: "report",
});

JSON and file output are machine-first. Each discovered file carries a stable identifier plus canonical metadata such as:

id
path
service
suiteName
type
internalType
skipped
skipReason
locks
dependsOn
displayName
history

Discovery history is generic and local to testkit. firstSeenAt is derived from the first time a file appears in the history index, not from filesystem or Git metadata.

Local Databases

@elench/testkit provisions Docker-managed local Postgres automatically for services that define database: database.postgres(...).

template databases are cached
runtime databases are cloned from templates when binding is per-runtime
shared databases are reused when binding is shared
source schema caches are refreshed only from the configured source database
clean commits, dirty worktrees, and non-git directories get separate source schema cache entries automatically
template fingerprints are derived automatically from env files, source schema cache, migrate/seed config, and repo contents

db schema refresh forces a source database dump into the .testkit source schema cache. db schema verify prepares local templates and verifies local replay against the cached/refreshed source schema. --skip-schema-source-verify is available as a narrow escape hatch when users need to run tests while schema verification is temporarily blocked.

Development Tests

npm test
npm run test:unit
npm run test:integration
npm run test:system
npm run test:live:github
npm run test:live:neon
npm run test:database-version:compat

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@elench/testkit

Usage

Automatic Regression Diagnosis

Tooling Adapters

Setup

Authoring

Discovery

Local Databases

Development Tests