@elench/testkit
v0.1.143
Published
Assistant-first CLI for running, inspecting, and debugging local testkit suites
Downloads
13,928
Readme
@elench/testkit
@elench/testkit discovers *.testkit.ts files, infers suite ownership from the
filesystem, starts local services, provisions Docker-managed local Postgres
databases, and runs test suites.
The package is now driven by testkit.config.ts, not testkit.config.json.
Usage
cd my-product
# Launch the interactive assistant
npx @elench/testkit
npx @elench/testkit assistant --provider codex --model gpt-5.4
npx @elench/testkit assistant --provider claude --model sonnet --effort high
# Ask for one assistant turn non-interactively
npx @elench/testkit assistant --message "Show the latest Testkit status"
npx @elench/testkit assistant --message "Run the e2e tests for the api service"
npx @elench/testkit assistant --message "Why did the latest failure happen?"
# Run every testkit-managed suite in batch mode
npx @elench/testkit run
# Inspect discovered tests without running them
npx @elench/testkit discover
npx @elench/testkit discover --output-mode verbose
npx @elench/testkit discover --json > .testkit/discovery.json
npx @elench/testkit discover --output .testkit/discovery.json
# Filter by type
npx @elench/testkit --type int
npx @elench/testkit --type dal
npx @elench/testkit --type e2e
npx @elench/testkit --type int,e2e,dal
npx @elench/testkit --type ui
# Parallel file execution
npx @elench/testkit --workers 8
# One file-level wall clock budget for every suite file
npx @elench/testkit --file-timeout-seconds 60
# Specific service / suite
npx @elench/testkit --service frontend --type ui -s navigation
npx @elench/testkit --service api --type int -s health
npx @elench/testkit --type int,e2e,dal -s dal:queries
# Exact file
npx @elench/testkit --type int --file __testkit__/health/health.int.testkit.ts
# Temporarily ignore repo-declared skip rules
npx @elench/testkit --ignore-skip-rules --file __testkit__/billing/billing.int.testkit.ts
# Deterministic git-trackable status snapshot
npx @elench/testkit --type int --write-status
# Lifecycle
npx @elench/testkit status
npx @elench/testkit destroy
npx @elench/testkit cleanup
# Local production environment
npx @elench/testkit local up
npx @elench/testkit local status
npx @elench/testkit local env --service frontend
npx @elench/testkit local logs --service frontend
npx @elench/testkit local down
# Inspect the latest run artifact through the assistant
npx @elench/testkit assistant --message '/inspect "__testkit__/health/health.int.testkit.ts"'
npx @elench/testkit assistant --message '/artifacts "__testkit__/health/health.int.testkit.ts"'
npx @elench/testkit assistant --message "/logs api"
# Automatic regression intelligence
# Configure testkit.regressions.json and testkit classifies new vs known regressions automatically during runs
# Diagnostics: refresh or verify the source schema cache
npx @elench/testkit db schema refresh --service api
npx @elench/testkit db schema verify --service apitestkit is assistant-first in an interactive TTY. The interactive assistant
opens with a repo-aware landing panel: provider/model, current directory,
latest run result, focused file/service, regression counts, and suggested next
prompts. The bottom composer is the primary interaction surface, and the status
line shows approximate context remaining when the active provider/model window
is known, for example [~96% remaining].
Natural-language turns still go through Codex or Claude, but testkit owns the
transcript, command-observation state, context files under .testkit/assistant/,
and rendering around testkit, npm, and npx commands. Providers own command
choice, edits, retries, and recovery. When the provider runs testkit-managed
commands, the assistant observes real Testkit command executions through command
sidecars and run artifacts, then refreshes the latest run state so follow-up
questions can use the new result immediately.
Assistant transcript content has an explicit rendering contract. Provider and user prose is stored as raw Markdown and parsed only by the TUI renderer. System notices stay plain text. Observed commands and run sessions stay structured, so Testkit renders real command state and run artifacts without parsing provider prose for control flow.
Assistant provider coverage is tested against the real codex and claude
CLIs. The test suite assumes both are installed and authenticated; provider
adapter, assistant shell, command-observation, and real testkit-run coverage do
not use provider stand-in binaries or simulated provider sessions.
Assistant runtime settings are repo-local. Use /provider, /model,
/effort, and /settings inside the assistant to inspect or change the active
provider runtime; changes are persisted to .testkit/assistant/settings.json.
CLI flags such as --provider, --model, --effort, and repeatable
--provider-arg override those settings for the current launch. The composer
has an always-visible cursor and supports arrow keys, Home/End, Ctrl+A/Ctrl+E,
Backspace, Delete, Ctrl+D, and Ctrl+L to clear the visible transcript. Ctrl+C
quits the assistant.
The non-interactive assistant --message ... mode uses the same provider
command-observation path for one hosted turn at a time. It is useful in scripts
and tests, but it is not the primary interactive UX.
Batch run output stays intentionally short: one line per completed file, a
concise failure block, and a final summary. Service logs, captured runtime
output, emitted artifacts, and assistant-visible run state are persisted under
.testkit/results/.
testkit discover also maintains a small durable per-test history index at
.testkit/history/tests.json. The index tracks first/last seen timestamps,
run counts, pass/fail/skip counts, average duration, and last observed status,
and those summaries are exposed in compact, verbose, and JSON discovery output.
Test execution also maintains a scheduler cache at .testkit/timings.json.
Completed file-level task durations are used to rank future runs with a
longest-estimated-duration-first policy, so slow files start earlier when
workers are available. Run artifacts include compact scheduler metadata under
planning so ordering decisions are inspectable.
Automatic Regression Diagnosis
If regressions.file is configured, every run automatically classifies observed
results without any separate follow-up maintenance command.
testkit distinguishes four user-facing outcomes:
new regressionsknown regressionsfixed known regressionscatalog stale
The default CLI keeps those signals lightweight:
- failed files print inline diagnosis immediately under the file line
- the final summary box reports aggregate regression counts only
- machine-readable artifacts gain per-file
diagnosisplus top-levelregressions.summary,regressions.catalog, and preparedregressions.drafts
catalog stale is repo hygiene, not a test failure. It means the regression
catalog or linked issue tracker metadata needs attention, for example because a
linked issue is closed but the regression still reproduces.
Tooling Adapters
testkit also ships tool-specific config helpers so consumer repos do not need
repo-local runtime policy files just to cooperate with managed runs.
// drizzle.config.ts
import { defineConfig } from "@elench/testkit/drizzle";
export default defineConfig({
schema: "./src/db/schema/index.ts",
out: "./src/db/migrations",
dialect: "postgresql",
dbCredentials: {
url: process.env.DATABASE_URL!,
},
});// vitest.config.ts
import { defineConfig } from "@elench/testkit/vitest";
export default defineConfig({
test: {
include: ["src/**/*.test.ts"],
},
});// ui.config.ts
import { defineConfig, devices } from "@elench/testkit/ui";
export default defineConfig({
testDir: "./__testkit__",
projects: [{ name: "chromium", use: { ...devices["Desktop Chrome"] } }],
}, {
dotenvFiles: [".env.local"],
});For scripts and app-runtime code, @elench/testkit/env provides shared helpers
for managed runtime detection, dotenv loading, and local-database safety:
import {
assertLocalDatabaseUrl,
loadDotenvFiles,
shouldLoadDotenv,
} from "@elench/testkit/env";
if (shouldLoadDotenv()) {
loadDotenvFiles({ files: [".env", ".env.local"] });
}
assertLocalDatabaseUrl(process.env, "seed.ts");Setup
Create testkit.config.ts at repo root:
import {
app,
database,
defineConfig,
defineFile,
environment,
toolchain,
} from "@elench/testkit/config";
export default defineConfig({
execution: {
workers: 8,
fileTimeoutSeconds: 60,
},
regressions: {
file: "testkit.regressions.json",
sync: {
provider: "github",
mode: "warn",
cacheTtlSeconds: 900,
},
},
fingerprints: {
exclude: ["next-env.d.ts"],
},
toolchains: {
frontendNode: toolchain.node({
cwd: "frontend",
detect: "auto",
install: "download",
}),
},
environments: {
local: environment.local({
target: "frontend",
data: "reuse",
}),
},
services: {
api: app.node({
cwd: ".",
entry: "src/index.ts",
port: 3004,
envFiles: [".env.testkit"],
database: database.postgres({
sourceSchema: database.schema.fromEnv("PRODUCTION_DATABASE_URL"),
template: {
inputs: ["db/schema.sql", "scripts/seed.ts"],
migrate: [{ kind: "sql-file", path: "db/schema.sql" }],
seed: [{ kind: "command", run: "npm run db:seed" }],
verify: [{ kind: "module", target: "src/testkit/verify-seed.ts#verifySeed" }],
},
}),
runtime: {
instances: 1,
maxConcurrentTasks: 4,
},
}),
frontend: app.next({
cwd: "frontend",
mode: "start",
port: 3000,
dependsOn: ["api"],
envFiles: ["frontend/.env.testkit"],
env: {
values: {
NEXT_PUBLIC_API_URL: "{baseUrl:api}",
},
},
runtime: {
instances: 1,
maxConcurrentTasks: 2,
toolchain: "frontendNode",
},
}),
},
});File-local execution metadata now lives next to the test when possible:
import { defineFile } from "@elench/testkit/config";
export const testkit = defineFile({
skip: "Billing is currently unavailable locally",
locks: ["global-worker-loop"],
});testkit.config.ts is optional for simple repos, but it is the primary escape hatch
for:
- worker count and per-file runtime budget
- per-file wall clock timeout budget
- multi-service graphs
- local runtime instance counts
- per-runtime concurrent task caps
- repo-managed Node toolchains for prepare/start commands
- one-time runtime preparation steps for stable shared servers
- local DB binding configuration
- source-backed schema verification
- template database migrate / seed / verify stages
- explicit per-file or per-suite locks
- named HTTP suite profiles
- automatic regression classification for new vs known failures
- optional GitHub-backed regression issue sync
- repo-level fingerprint include/exclude policy
- repo-declared suite/file skip policies with explicit reasons
- telemetry upload configuration
runtime.prepare is the generic build-once hook for shared runtimes. It runs
once per runtime generation before local services start, fingerprints declared
inputs, and writes cache state under the service runtime directory. This is the
right way to move expensive browser targets from next dev / watch mode to
stable build-and-start flows.
testkit local starts the same service graph as a persistent local production
environment instead of a test run. It provisions local databases, runs template
setup, runs runtime.prepare, starts dependent services, and records state
under .testkit/environments/<name>/ rather than .testkit/_runs. Local
environment processes receive TESTKIT_ACTIVE=1, TESTKIT_MODE=local, and
TESTKIT_LOCAL_ENV=<name>. Use data: "reuse" for fast restarts against the
existing local runtime database, data: "reset" to refresh runtime databases
from their templates on each launch, or --rebuild to destroy and recreate the
environment state. Testkit only supports local environments here; it does not
copy production data and it refuses managed runtime database URLs that are not
loopback PostgreSQL URLs.
database.template is the database-side equivalent for reusable template DB
state. When database.sourceSchema is configured, Testkit treats the configured
source database as the schema source of truth. A normal testkit run resolves a
commit-aware source schema cache under
.testkit/db/<service>/source-schemas/, applies that cached schema to the local
template DB, runs local template setup, and verifies that the replayed local
schema still matches the source dump. If local replay differs, Testkit refreshes
from the source once for the current cache key and retries. If it still differs,
the run fails with schema diagnostics under .testkit/results/schema.
Source schema cache keys are derived automatically from repo state:
- clean git worktrees use
commits/<sha> - dirty git worktrees use
dirty/<sha>-<fingerprint> - non-git directories use
nogit/<fingerprint>
Dirty worktree fingerprints use Git's own ignore engine for untracked files, so
normal .gitignore rules are respected. Testkit-owned outputs are always
excluded from fingerprints: .testkit/, .next-testkit/, and
testkit.status.json. testkit.config.ts is normal repo configuration and is
not excluded automatically. For user-managed generated files, add repo-level
fingerprints.exclude patterns; fingerprints.include can opt specific paths
back in below an excluded directory.
Branch names and worktree paths are recorded as metadata but do not affect clean commit cache keys, so branch renames and clean worktrees at the same commit reuse the same source schema. Dirty worktrees are isolated by content fingerprint so local experiments cannot overwrite a clean commit baseline.
Template setup executes in three explicit phases:
migrateseedverify
Schema drift is checked after each successful migrate and seed step, and
verify only runs once local replay matches the source schema. Source refreshes
use pg_dump --schema-only --no-owner --no-privileges, so seed/reference data
is never written into the baseline. Keep schema-changing setup in its own step
where possible; a single command that changes schema and then fails before
exiting cannot be refreshed at the midpoint.
Source schema refreshes are intentionally single-connection and pooler-safe.
If a Neon pooled source URL is configured, Testkit rewrites it to the matching
direct Neon endpoint before running pg_dump and records the original/resolved
host classifications beside the resolved cache entry. Unknown PgBouncer/pooler
URLs fail closed; configure a direct source URL for those providers. Concurrent
refreshes for the same service and cache key are serialized with a cache-local
lock so multiple Testkit processes do not stampede the source database. Testkit
also maintains .testkit/db/<service>/source-schemas/index.json and prunes old
inactive cache entries automatically.
For most repos, prefer declarative step objects directly inside
database.postgres({ template: ... }) and runtime.prepare.steps.
The supported shapes are:
{ kind: "command", run: "..." }{ kind: "sql-file", path: "..." }{ kind: "module", target: "file.ts#exportName" }
runtime.toolchain is the first-class way to make those prepare/start commands
run under the correct Node toolchain instead of whatever node/npm happened
to launch testkit. Node toolchains support:
- host verification mode:
install: "require-host" - cached repo-local provisioning mode:
install: "download" - auto-detection from:
package.json#volta.node.nvmrc.node-version.tool-versions(nodejs)package.json#engines.nodepackage.json#volta.npmpackage.json#packageManagerpackage.json#engines.npm
Example:
toolchains: {
frontendNode: toolchain.node({
cwd: "frontend",
detect: "auto",
install: "download",
}),
},
services: {
frontend: app.next({
cwd: "frontend",
port: 3000,
runtime: {
toolchain: "frontendNode",
},
}),
}If regressions.file is configured, testkit enriches
.testkit/results/latest.json and testkit.status.json with:
- per-file
failureDetails - per-file
diagnosismetadata (new regression, known regression, fixed known regression) - top-level
regressionssummary and prepared draft updates
Regression-catalog entry authoring uses this contract:
summary- concise local statement of the regression slice
cause- underlying technical cause of the failure
fingerprints- selectors that let testkit automatically recognize the regression in future runs
If regressions.sync is also configured, testkit syncs linked GitHub issues and
adds top-level regression catalog health to the run/status artifacts. The most
important catalog-staleness signal is:
- a known regression still fails, but the linked GitHub issue is closed
In mode: "error", catalog health can also fail the run for problems such as:
- closed issues that still reproduce
- missing issue refs
- validation unavailability
Reproduction warnings are execution-aware:
failedmeans the known regression reproducedpassedmeans the matched test executed and did not reproduceskippedandnot_rundo not count as reproduction evidence
Authoring
HTTP suites:
import { defineHttpSuite } from "@elench/testkit";
import { expect } from "@elench/testkit/runtime";
const suite = defineHttpSuite(({ rawReq }) => {
const response = rawReq.get("/health");
expect.status(response, 200, "health returns 200");
});
export default suite;testkit suite files should default-export the suite object returned by
defineHttpSuite(...) or defineDalSuite(...).
Named HTTP profiles live in testkit.config.ts and can be referenced by name:
import { defineHttpSuite } from "@elench/testkit";
import { auth, defineConfig } from "@elench/testkit/config";
const appAuth = auth.fixture({
contract: auth.contracts.jsonSession({
authCookie: "session",
organizationIdPath: "data.organizations[0].id",
}),
topology: auth.topologies.crossOrg({
namespace: "example-app",
actors: {
primary: { org: "primary" },
reviewer: { org: "primary" },
outsider: { org: "secondary" },
},
}),
});
export default defineConfig({
profiles: {
http: appAuth.profiles({
defaultAuth: auth.profile.actor("primary"),
reviewers: auth.profile.actors({
actors: ["reviewer", "outsider"],
primaryActor: "reviewer",
}),
raw: auth.profile.raw(),
}),
},
});
const suite = defineHttpSuite({ profile: "defaultAuth" }, ({ actor, actors, req }) => {
req.get("/api/auth/session");
actor?.req.get("/api/auth/session");
req.as("outsider").get("/api/auth/session");
actors.get("reviewer").rawReq.get("/api/auth/session");
});DAL suites:
import { defineDalFixtures, defineDalSuite } from "@elench/testkit";
const fixtures = defineDalFixtures(({ db, fixtureScope }) => ({
widget() {
return fixtureScope.seed("widget", "primary", { name: "Primary Widget" }, () => {
const widgetId = fixtureScope.id("widget");
db.exec(`
INSERT INTO widgets (id, name)
VALUES ('${widgetId}', 'Primary Widget')
`);
return { widgetId };
});
},
}));
const suite = defineDalSuite({ fixtures }, ({ db, fixtureScope, fixtures }) => {
const widget = fixtures.widget();
db.query(`select id from widgets where id = '${widget.widgetId}'`);
fixtureScope.records();
});
export default suite;defineDalFixtures(...) is the package-owned DAL seeding model. It gives every
suite a deterministic fixtureScope with:
fixtureScope.id(label)/uuid(label)fixtureScope.slug(label)fixtureScope.email(label)fixtureScope.string(label, options)fixtureScope.token(label)fixtureScope.seed(kind, key, signature, create)fixtureScope.records()
testkit enforces strict fixture behavior:
- one logical fixture per
kind + key - identical reseeds reuse the same seeded value
- conflicting reseeds fail immediately
- dependency cycles fail immediately
- seeded fixture records are persisted as a
testkit.dal-fixturesartifact
Scenario suites:
import { defineScenarioSuite } from "@elench/testkit";
const suite = defineScenarioSuite(({ rawReq, scenario }) => {
const plan = scenario.choose("journey", {
endpoint: scenario.pick("endpoint", ["/health", "/message"]),
includeHealthCheck: scenario.maybe("includeHealthCheck", 1),
});
const selected = scenario.resource("selected-endpoint", () => rawReq.get(plan.endpoint));
scenario.step("fetch selected endpoint", () => {
selected.get();
});
});
export default suite;First-class runtime namespaces:
import { checks, expect, parse, waitFor } from "@elench/testkit/runtime";
const suite = defineHttpSuite(({ actor, rawReq, req }) => {
const me = req.get("/api/v1/me");
expect.status(me, 200, "authenticated session loads");
expect.json(me, (body) => typeof body.data?.id === "string", "response includes user id");
checks.authGate(rawReq, "sessions", {
get: ["/api/v1/sessions"],
});
actor?.rawHeaders();
const upload = req.multipart.post("/api/v1/uploads", {
fields: { name: "fixture" },
files: [{ field: "file", data: "hello", filename: "fixture.txt", contentType: "text/plain" }],
});
expect.status(upload, 201, "upload is accepted");
parse.safeJson(upload);
});Low-level runtime primitives still exist when you genuinely need them:
import { check, group, http } from "@elench/testkit/runtime";waitFor() consumes the file budget configured by execution.fileTimeoutSeconds.
Consumers should not set local timeout values in test files.
import { parse, waitFor } from "@elench/testkit/runtime";
const response = waitFor(
() => req.get("/api/v1/jobs/123"),
(res) => parse.json(res).data?.status === "completed",
{ description: "job 123 to complete" }
);Discovery
testkit discovers suites from __testkit__/ directories.
Example layouts:
src/api/routes/__testkit__/auth/me.int.testkit.tssrc/db/__testkit__/sessions/count-type.dal.testkit.tsfrontend/__testkit__/navigation/navigation.ui.testkit.tssrc/internal/handler/__testkit__/repos/crud.int.testkit.ts
testkit uses these suffixes automatically:
*.int.testkit.ts*.e2e.testkit.ts*.scenario.testkit.ts*.dal.testkit.ts*.load.testkit.ts*.ui.testkit.ts
See docs/test-types.md for the canonical type model.
Ownership is inferred from:
- the deepest matching service root from
services.<name>.local.cwd - optional
services.<name>.discovery.rootsoverrides for shared-root edge cases
Suite names are inferred from the colocated path:
auth/__testkit__/*.int.testkit.ts=>authroutes/__testkit__/auth/*.int.testkit.ts=>auth
Discovery is also a first-class CLI/API surface:
testkit discover- human-first compact output with service -> type -> suite -> file hierarchy
testkit discover --output-mode verbose- explicit paths, IDs, locks, dependencies, skip reasons, and history detail
testkit discover --json- machine-readable output with stable enums, canonical paths, and summary data
testkit discover --output .testkit/discovery.json- writes the same machine-readable JSON document to a file artifact
Compact mode prefers derived human labels such as Agent Configs Auth Gate
instead of printing long file paths as the primary row label. Exact paths remain
available in verbose and JSON output.
The public API is exported from @elench/testkit/discovery:
import { discoverTests } from "@elench/testkit/discovery";
const result = await discoverTests({
dir: process.cwd(),
runnableOnly: true,
diagnostics: "report",
});JSON and file output are machine-first. Each discovered file carries a stable identifier plus canonical metadata such as:
idpathservicesuiteNametypeinternalTypeskippedskipReasonlocksdependsOndisplayNamehistory
Discovery history is generic and local to testkit. firstSeenAt is derived
from the first time a file appears in the history index, not from filesystem or
Git metadata.
Local Databases
@elench/testkit provisions Docker-managed local Postgres automatically for
services that define database: database.postgres(...).
- template databases are cached
- runtime databases are cloned from templates when binding is
per-runtime - shared databases are reused when binding is
shared - source schema caches are refreshed only from the configured source database
- clean commits, dirty worktrees, and non-git directories get separate source schema cache entries automatically
- template fingerprints are derived automatically from env files, source schema cache, migrate/seed config, and repo contents
db schema refresh forces a source database dump into the .testkit source
schema cache. db schema verify prepares local templates and verifies local
replay against the cached/refreshed source schema. --skip-schema-source-verify
is available as a narrow escape hatch when users need to run tests while schema
verification is temporarily blocked.
Development Tests
npm test
npm run test:unit
npm run test:integration
npm run test:system
npm run test:live:github
npm run test:live:neon
npm run test:database-version:compat