kmp-test-runner
v0.14.0
Published
Parallel test runner for Kotlin Multiplatform and Android Gradle projects with an agent-friendly single-line JSON envelope. CLI + Gradle plugin.
Maintainers
Readme
kmp-test-runner
Parallel test runner for Kotlin Multiplatform and Android Gradle projects. CLI and Gradle plugin that fan out unit, instrumented, coverage, and benchmark tasks across modules in parallel and emit a single-line JSON envelope optimized for AI coding agents.
Why this exists — token cost per agent test-run iteration
For an AI coding agent re-running a workflow on every change, the cheapest path matters. Token-cost reduction scales with project size — small KMP libraries see ~1–100× reductions on parallel, medium projects ~90×, large projects ~123× (NowInAndroid sample), and coverage on a ~70-module composite crosses 30,000× across every measured Claude tokenizer and overflows Anthropic's count_tokens payload limit on the raw ./gradlew capture (chunked counting recovers the number even there). Three observation strategies, every cell measured (methodology):
- 🔻 A. Baseline. Raw
./gradlew+ reading every generated report file — what an agent does withoutkmp-test. The cost we're competing against. - 🟢 B. Ours —
kmp-test <feature>. Markdown-summarised stdout. Drop-in replacement. - 🟢 C. Ours —
kmp-test <feature> --json. Single-line JSON envelope. Recommended for agents.
Sampled two ways: (a) a 6-project OSS matrix for the per-bucket comparison, and (b) a single anonymized private composite (private-large-A, ~70 KMP modules + Kover + kotlinx-benchmark) for the per-tokenizer drill-down where Kover and @Benchmark functions are actually configured. Buckets: small (1–5 modules), medium (6–20 modules), large (21+ modules). Every cell is a real messages.countTokens API count (Claude columns) or cl100k_base offline count via js-tiktoken; per-bucket aggregates: median + min/max range + spread.
Provenance. OSS multi-project re-measurement landed 2026-05-12; re-run 2026-05-18 with a recursive module walker that picks up deeply-nested layouts (NowInAndroid grew from 5 → 36 captured modules, Confetti from 13 → 16).
private-large-Aper-feature drill-down re-measured 2026-05-19 (v0.10.1) — closes a cross-project mixing in the v0.10.0 headline ratio that combinedprivate-large-A's A baseline with NowInAndroid's C envelope. OSS sample: KaMPKit, kotlinconf-app, kmp-production-sample (small) · PeopleInSpace, Confetti (medium) · NowInAndroid (large). Drill-down reference:private-large-A(~70 KMP modules + Kover, anonymized). Per-project OSS captures are gitignored undertools/runs/multi-project-token-cost-<date>/per-project/<label>/<feature>/; the committed aggregate lives attools/runs/multi-project-token-cost-<date>/aggregate-<date>.md.private-large-Acaptures are gitignored undertools/runs/<feature>/; committed cross-model evidence attools/runs/cross-model-results-<feature>.txt. Reproducible floor intests/fixtures/kmp-cross-platform-e2e/— a single-module synthetic KMP fixture covering all 8 targets (jvm + js + wasmJs + 3 iOS archs + macosArm64 + androidLibrary).
A→C reduction by project size — parallel median across the OSS sample
cl100k_base only (the bucket scaling is most visible against a single tokenizer). Per-project numbers in tools/runs/multi-project-token-cost-2026-05-18/aggregate-2026-05-18.md. Anthropic-side counts within ±20% of cl100k per the cross-model evidence (tools/runs/cross-model-results-parallel.txt).
| Bucket | Sample (n) | Projects | 🔻 A median | 🟢 C median | A→C median | A→C range | |--------------------|-----------:|-----------------------------------------------------|------------:|------------:|-----------:|-----------------| | 🟦 small (1–5) | 3 | KaMPKit (2), kotlinconf-app (5), kmp-production-sample (2) | 24,454 | 338 | 56.6× | 1.3× – 102.2× | | 🟨 medium (6–20) | 2 | PeopleInSpace (7), Confetti (16) | 427,586 | 4,499 | 90.0× | 84.4× – 95.6× | | 🟥 large (21+) | 1 | NowInAndroid (36) | 226,291 | 1,839 | 123.1× | (single sample) |
Note on NowInAndroid. The 2026-05-12 measurement undercounted NIA's deeply-nested layout (
feature/<name>/<api|impl>/,core/<name>/) — the walker was one-level-deep and captured only the 5 top-level modules. The recursive walker shipped in v0.10 #7 (tools/measure-token-cost.js#filterModulesByGlob) now honours nested grouping dirs, surfacing all 36 modules. The 123× ratio above reflects that fix.private-large-A(~70 KMP modules, the v0.10.1 per-feature drill-down reference) is anonymised and tracked separately in the per-feature drill-down section below — it is not part of the OSS bucket sample above.
Reproducible floor.
tests/fixtures/kmp-cross-platform-e2e/is a 1-module synthetic KMP project covering all 8 supported targets (jvm + js(IR) + wasmJs + 3 iOS archs + macosArm64 + androidLibrary). Re-runnode tools/measure-token-cost.js --project-root tests/fixtures/kmp-cross-platform-e2e --feature parallelto see the floor case yourself — no SDKs required beyond JDK 21.
Large-project ceiling —
coverageoutlier. On theprivate-large-Acomposite (~70 KMP modules + Kover, anonymised),coverageApproach A produces 74 MB of kover HTML/XML — 28.7 M cl100k / 36.6 M opus / 28.5 M sonnet & haiku tokens — and overflows Anthropic'scount_tokensendpoint (413 too_large) in a single HTTP request. Chunked counting at file-record boundaries (23 chunks @ ~3.1 MiB UTF-8 each; sum per-chunkinput_tokens) recovers the Anthropic-side count. Approach C (kmp-test coverage --json) renders the same signal in 734 cl100k tokens — within-project A:C reduction is 39,175× cl100k / 30,075× opus / 30,350× sonnet & haiku. The agent's working memory stays focused on the code instead of log noise.
Two observations carry across every bucket:
- Tokenizer transition.
claude-sonnet-4-6andclaude-haiku-4-5share a tokenizer (identical counts to the unit on every cell).claude-opus-4-8uses a different tokenizer that produces 30–100% more tokens for the same input — most visibly on heavy XML/HTML payloads (🔻 baseline A). - C stays small regardless of bucket — under ~500 tokens for
parallelon small projects, growing to ~9K on medium when test reports are dense, and back to compact on large projects when summarisation kicks in viakmp-test's aggregation logic. The--jsonenvelope strips the workload to{exit_code, tests, modules, errors[]}no matter how heavy the underlying gradle did.
Per-feature drill-down — private-large-A reference composite (cross-tokenizer detail)
The bucketed table above shows per-bucket medians for parallel across the OSS sample. The four drill-down tables below show per-tokenizer detail (cl100k + 3 Claude families) on the private-large-A reference composite (~70 KMP modules + Kover + kotlinx-benchmark, anonymised). Every cell is a fresh measurement against today's kmp-test --json envelope shape — every ratio is honest within-project (A and C both come from private-large-A). The 🔻 A column is the baseline an agent runs without kmp-test; 🟢 columns are our approaches (B = kmp-test markdown, C = kmp-test --json). The gap between the heavy 🔻 A baseline and the tiny 🟢 C — read straight off the A:C column — is the savings story.
parallel — full test suite (private-large-A reference)
| Model | 🔻 A. baseline (raw ./gradlew) | 🟢 B. ours · kmp-test | 🟢 C. ours · --json | A:C |
|------------------|---------------------------------:|------------------------:|----------------------:|-----:|
| 🟦 cl100k_base | 1,456,399 | 19,604 | 4,039 | 361× |
| 🟥 opus-4-8 | 2,384,531 | 35,953 | 7,099 | 336× |
| 🟩 sonnet-4-6 | 1,941,373 | 25,284 | 4,980 | 390× |
| 🟧 haiku-4-5 | 1,941,373 | 25,284 | 4,980 | 390× |
Captures: tools/runs/parallel/ · evidence: tools/runs/cross-model-results-parallel.txt.
coverage — Kover XML + HTML reports (private-large-A reference)
The single largest data point in the whole measurement — and the cell that motivated PR #13's chunked-counting recovery. A koverXmlReport + koverHtmlReport invocation against private-large-A generates 74 MB of kover HTML/XML under build/reports/kover/**. cl100k_base scores it at 28.7 M tokens; Anthropic's count_tokens returns 413 request_too_large on every Claude family in a single HTTP request. PR #13's chunking path splits the capture at \n=== <file> ===\n file-record boundaries (23 chunks @ ~3.1 MiB UTF-8 each) and sums per-chunk input_tokens — see the methodology section for the activation rules. The same signal renders in 734 cl100k tokens through kmp-test coverage --json.
| Model | 🔻 A. baseline (raw ./gradlew) | 🟢 B. ours · kmp-test | 🟢 C. ours · --json | A:C |
|------------------|---------------------------------:|------------------------:|----------------------:|------------:|
| 🟦 cl100k_base | 28,754,177 | 803 | 734 | 39,175× |
| 🟥 opus-4-8 | 36,571,742 | 1,394 | 1,216 | 30,075× |
| 🟩 sonnet-4-6 | 28,468,274 | 1,055 | 938 | 30,350× |
| 🟧 haiku-4-5 | 28,468,274 | 1,055 | 938 | 30,350× |
What this means in practice: an agent that follows the canonical "run gradle and read the reports" pattern produces a payload it cannot even fit into a single count_tokens API call to measure its own size, let alone fit into a 200 K context window. Coverage on private-large-A is ~144× a single context window per iteration in cl100k tokens (~183× in opus tokens). Captures: tools/runs/coverage/ · evidence: tools/runs/cross-model-results-coverage.txt.
changed — tests for modules touched since HEAD~1 (private-large-A reference)
| Model | 🔻 A. baseline (raw ./gradlew) | 🟢 B. ours · kmp-test | 🟢 C. ours · --json | A:C |
|------------------|---------------------------------:|------------------------:|----------------------:|-----:|
| 🟦 cl100k_base | 41,626 | 125 | 173 | 241× |
| 🟥 opus-4-8 | 69,678 | 236 | 321 | 217× |
| 🟩 sonnet-4-6 | 55,181 | 159 | 222 | 249× |
| 🟧 haiku-4-5 | 55,181 | 159 | 222 | 249× |
Ratios scale with the size of the diff: this v0.10.1 measurement was taken against a HEAD~1 that touched a single test file on a single KMP module, so A captures that one module's test report. A bigger diff (multi-module refactor, dependency bump) would scale A linearly while C stays compact — the v0.9 measurement of the same project against a wider commit reported A=1.1 M cl100k / C=144 → 7,766× cl100k (committed history: see tools/runs/cross-model-results-changed.txt git log). B/C dispatch through the full parallel coverage suite (broader scope than A's per-module :module:test), so wall-clock time isn't apples-to-apples — token count is. Captures: tools/runs/changed/.
benchmark — kotlinx-benchmark suites (private-large-A reference, single module)
v0.10.1 measurement: one benchmark module with three @Benchmark classes actively running (encryption / password-encryption / stream-encryption microbenchmarks). The v0.9 measurement on the same project covered a state with the plugin applied but no @Benchmark functions, so its A was almost entirely the gradle config banner; today's number reflects real benchmark JSON output written to build/reports/benchmarks/.
| Model | 🔻 A. baseline (raw ./gradlew) | 🟢 B. ours · kmp-test | 🟢 C. ours · --json | A:C |
|------------------|---------------------------------:|------------------------:|----------------------:|-----:|
| 🟦 cl100k_base | 52,638 | 171 | 273 | 193× |
| 🟥 opus-4-8 | 72,459 | 314 | 494 | 147× |
| 🟩 sonnet-4-6 | 61,856 | 205 | 322 | 192× |
| 🟧 haiku-4-5 | 61,856 | 205 | 322 | 192× |
Single-module scope. The composite ships seven benchmark modules; running the full suite would scale A roughly linearly with benchmark count × iterations, while C stays compact (the JSON envelope keeps the exit-status / counts shape regardless of how many benchmarks ran). B grows because the markdown report inlines per-run scores by design. Captures: tools/runs/benchmark/ · evidence: tools/runs/cross-model-results-benchmark.txt.
How the numbers are produced
For each project × feature, the script captures one A/B/C triplet — for A, gradle stdout (./gradlew :module:<task> --console=plain) plus every generated report file matched by the feature's predicate (test HTML/XML for parallel/changed, kover HTML/XML for coverage, kotlinx-benchmark JSON for benchmark); for B and C, the corresponding kmp-test <feature> [--json] stdout. The same byte-for-byte text is then re-tokenized two ways: offline via js-tiktoken using cl100k_base (the baseline column), and online via Anthropic's messages.countTokens API per Claude 4.x model. When a payload exceeds Anthropic's count_tokens single-request limit (~4 MB UTF-8; observed 413 too_large on 74 MB kover XML), the chunked path splits at \n=== <file> ===\n file-record boundaries (falling back to ~3.5 MiB byte windows) and sums per-chunk input_tokens — BPE tokenisers are approximately additive across chunks (<0.001% boundary error at measurement scale). Reproduce against your own KMP project sample with:
# Multi-project bucketed measurement (PR #13). Project list lives in
# tools/.measurement-projects.json (gitignored — paths only stay local) OR in
# the KMP_MEASUREMENT_PROJECTS env var (newline-separated `path|label|bucket`).
node tools/measure-token-cost.js # all 6 features × all projects
node tools/measure-token-cost.js --features parallel,coverage # subset of features
node tools/measure-token-cost.js --projects-config /custom/path.json --features changed
# Single-project mode (the v0.9 shape) still works:
node tools/measure-token-cost.js --feature parallel \
--project-root /path/to/your/kmp/project --module-filter "<module-glob>" --runs 1
# Cross-model re-tokenize via Anthropic count_tokens (chunked path activates
# automatically for >~3.5 MiB payloads — set --anthropic-chunk-bytes <n> to
# override). Multi-account fallback supported via ANTHROPIC_API_KEY_FALLBACK.
ANTHROPIC_API_KEY=sk-ant-... node tools/measure-token-cost.js --feature <name> \
--anthropic-models claude-opus-4-8,claude-sonnet-4-6,claude-haiku-4-5
# Multi-account workflows: set both keys, the tool auto-falls-back on 401:
export ANTHROPIC_API_KEY=sk-ant-account-A...
export ANTHROPIC_API_KEY_FALLBACK=sk-ant-account-B...
node tools/measure-token-cost.js --feature <name> \
--anthropic-models claude-opus-4-8Practical impact across buckets. A 5-iteration agent loop reading raw gradle output burns roughly 150K tokens on small projects, 2M tokens on medium projects, and 7M+ tokens on the
private-large-Acomposite forparallelalone — and ~144M cl100k / ~183M opus tokens forcoverageon the same composite (the headline outlier — 144× a single 200K context window per iteration in cl100k, 183× in opus). The same loops on--jsonburn 1.5K–9K tokens regardless of bucket. Without PR #13's chunked counting, Anthropic'scount_tokensendpoint cannot even tokenise a single rawcoveragecapture in one HTTP request on the large composite (28.7M cl100k / 36.6M opus tokens / 74MB of kover HTML/XML). The agent's working memory stays focused on the code instead of log noise.
Per-version detail and migration notes are in CHANGELOG.md.
Quick Start
Linux / macOS
curl -fsSL https://raw.githubusercontent.com/oscardlfr/kmp-test-runner/main/scripts/install.sh | bashWindows (PowerShell)
iwr -useb https://raw.githubusercontent.com/oscardlfr/kmp-test-runner/main/scripts/install.ps1 | iexOr install via npm:
npm install -g kmp-test-runnerThen run:
kmp-test parallel --project-root /path/to/your/projectWhy kmp-test-runner
KMP projects mix JVM, Android, and native targets — each with its own Gradle task graph. Running them sequentially on CI blows past time budgets; running them naively in parallel hits file-lock contention on Windows and socket conflicts on emulators. kmp-test-runner wraps the right maxParallelForks and task-isolation defaults so your suite runs safely in parallel without custom scripting, whether you call it from npm, Gradle, or a shell one-liner.
It's also the testing piece that's missing from Google's official android CLI for AI agents. That CLI (v0.7.x) covers project create/describe/deploy/emulator but ships no test subcommand — Google delegated test execution back to Gradle. kmp-test --json fills that gap with a single-line, parseable response that drops the agent-context cost from ~13 K tokens (raw Gradle + reports) to ~100 tokens. See "Agentic usage" below for the measurement.
Multi-agent safe (v0.3.8+). When two kmp-test runs target the same project root — common with parallel agents or CI matrix shards — an advisory lockfile (.kmp-test-runner.lock) coordinates them and per-run-id-suffixed report files prevent clobber. The second arrival exits with a clear lock_held error (--json surfaces errors[].code = "lock_held") instead of corrupting reports. Pass --force to override deliberately. See docs/concurrency.md for the full collision matrix.
Installation
Requirements
- Node.js 18+
- bash (Linux/macOS) or PowerShell 5.1+ (Windows)
- JDK 17+ and Gradle 8+ (Gradle plugin shape only)
Multi-JDK hosts. Since v0.6.1
kmp-testauto-detects JDKs fromAdoptium / Zulu / Microsoft / Semeru / BellSofton Windows,/Library/Java/JavaVirtualMachines/on macOS, and/usr/lib/jvm+/opt/{java,jdk}on Linux. If your project requires a JDK version different from the host default, the matching install is selected automatically — no manualJAVA_HOMEdance between projects. See the JDK toolchain section for the precedence chain and override flags.
Option 1 — Shell installer (recommended)
Linux / macOS
curl -fsSL https://raw.githubusercontent.com/oscardlfr/kmp-test-runner/main/scripts/install.sh | bashWindows (PowerShell)
iwr -useb https://raw.githubusercontent.com/oscardlfr/kmp-test-runner/main/scripts/install.ps1 | iexTo uninstall:
# Linux/macOS
curl -fsSL https://raw.githubusercontent.com/oscardlfr/kmp-test-runner/main/scripts/uninstall.sh | bash
# Windows (PowerShell)
iwr -useb https://raw.githubusercontent.com/oscardlfr/kmp-test-runner/main/scripts/uninstall.ps1 | iexOption 2 — npm
npm install -g kmp-test-runnerRequires Node.js 18+. The npm package includes the CLI entry point and all platform scripts.
Option 3 — Gradle plugin
Available on GitHub Packages. See the Gradle Plugin section for setup.
Usage
--project-root defaults to the current working directory, so the simplest invocation is:
cd /path/to/your/gradle/project
kmp-test parallelPass --project-root <path> explicitly when scripting from a different directory.
Platforms supported
| Target | Default --test-type | Underlying gradle task | Where it runs |
|--------|---------------------|------------------------|---------------|
| JVM / Desktop | common / desktop (auto-detect) | :module:desktopTest | host (Linux / macOS / Windows) |
| Android (unit) | androidUnit (auto-detect) | :module:testDebugUnitTest — or test${Flavor}DebugUnitTest with --flavor; the umbrella :module:test for flavored projects without --flavor | host JVM |
| Android (instrumented) | androidInstrumented (or kmp-test android) | :module:connectedDebugAndroidTest | connected device or emulator |
| iOS | ios | :module:iosSimulatorArm64Test (Apple-silicon), iosX64Test (Intel/CI), iosArm64Test (device) — picked per-module from the project model | macOS host with Xcode + simulator (Gradle handles simulator boot since AGP/KMP 1.9+) |
| macOS | macos | :module:macosArm64Test / macosX64Test / macosTest — picked per-module | macOS host (host-native; no simulator) |
| JS / Wasm | model-only (webTestTask field) | :module:jsTest / :module:wasmJsTest | host Node — wrapper-side --test-type js/wasm dispatch deferred (project model surfaces the task; pass it via --gradle-args when needed) |
kmp-test auto-detects the project type (kmp-desktop → common, otherwise androidUnit) when --test-type is omitted. iOS / macOS / androidInstrumented are opt-in — the wrapper does not switch to them implicitly because they require platform-specific runners (simulator / connected device). The auto-detected unit leg skips modules whose only tests are instrumented (see Choosing a test type).
Choosing a test type
Match the command to what you want to run:
| Your tests | Command | Runs on |
|---|---|---|
| Unit tests (JVM / desktop, Android host) | kmp-test parallel (auto) — or --test-type common / desktop / androidUnit | host JVM |
| Android instrumented / Compose UI tests | kmp-test android — or kmp-test parallel --test-type androidInstrumented | connected device / emulator |
| iOS / macOS | kmp-test parallel --test-type ios / --test-type macos | macOS host |
| All of the above in one run | kmp-test parallel --test-type all | host + device (per leg) |
Compose UI tests are instrumented tests — they live in
androidInstrumentedTest/androidTest, not the unit source set. The defaultkmp-test parallelauto-detects the unit leg, which skips instrumented-only modules: a project whose only tests are Compose UI tests produces no reports under a barekmp-test parallel. Run those withkmp-test android(or--test-type androidInstrumented). When the unit leg skips a module for this reason,kmp-testflags it — a[SKIP] … instrumented-only …line on stderr and awarnings[].code: "instrumented_only_skipped"entry under--json— so the right flag is one hop away.
Subcommands
| Subcommand | Description |
|-----------|-------------|
| parallel | Run all test targets in parallel with coverage |
| changed | Run tests only for modules changed since last commit |
| android | Run Android instrumented tests — Compose UI included; requires a connected device/emulator. For host unit tests use parallel |
| benchmark | Run benchmark suites with Dispatchers.Default for real contention |
| coverage | Generate coverage report only (skips test execution) |
| doctor | Diagnose the local environment (Node, bash/pwsh, gradlew, JDK, ADB) |
| info | Print environment paths and versions — lighter doctor with a flat JSON-friendly envelope (no PASS/WARN/FAIL judgments) |
| describe | Print project metadata as JSON — modules, test tasks, coverage detection, dependency graph |
| update | Update kmp-test to the latest GitHub release (idempotent; passes through to install scripts) |
Each subcommand has its own --help:
kmp-test parallel --help # parallel-specific flags + 1 example
kmp-test changed --help
kmp-test android --help
kmp-test benchmark --help
kmp-test coverage --help
kmp-test doctor --help
kmp-test info --help
kmp-test describe --help
kmp-test update --helpExamples
# Run all tests in parallel with coverage (uses cwd as project root)
kmp-test parallel
# Same, against an explicit path
kmp-test parallel --project-root /path/to/project
# Run only changed modules (fast CI re-run)
kmp-test changed
# Run Android instrumented tests
kmp-test android --device emulator-5554
# Run benchmarks
kmp-test benchmark --config smoke
# Generate coverage report only (skip test run)
kmp-test coverage
# Skip api / build-logic modules explicitly (or just let auto-skip handle them — see below)
kmp-test parallel --exclude-modules "*:api,build-logic"
# Agentic mode: emit a single JSON object on stdout (see "Agentic usage" below)
kmp-test parallel --json
# Run iOS tests against KMP modules with iosX64() / iosSimulatorArm64() targets (v0.7.0)
kmp-test parallel --test-type ios --module-filter ":mySharedKmp"
# macOS host-native — no simulator (v0.7.0)
kmp-test parallel --test-type macosMulti-platform test dispatch
When --test-type ios is set (v0.7.0), kmp-test consults the project model per module to pick the right gradle task. The model's iosTestTask field is the candidate-ordered output of:
iosSimulatorArm64Test → iosX64Test → iosArm64Test → iosTest
(Apple silicon) (Intel / CI) (device run) (umbrella fallback)The first entry that's actually present in the gradle task graph wins. macOS (--test-type macos) follows the same shape:
macosArm64Test → macosX64Test → macosTestPer-platform notes:
- iOS dispatches
:module:iosSimulatorArm64Test(or whatever the model picked). On macos-latest CI runners this typically boots a pre-installed simulator automatically — noxcrun simctlorchestration required at the wrapper level since KMP 1.9+ / AGP 9. On Intel hosts the model returnsiosX64Testinstead. Real-device runs (iosArm64Test) need a connected iPhone — out of scope for the wrapper, which doesn't manage devices. - macOS dispatches host-natively (no simulator). On Apple-silicon you get
macosArm64Test; on Intel,macosX64Test. macOS is not auto-detected —--test-type macosis opt-in. - Fallback when the model is absent: the wrapper picks
iosSimulatorArm64Test/macosArm64Test(most-portable defaults). Pre-build the model with any priorkmp-test parallelinvocation against the project for content-keyed cache to populate. - Skip env vars:
SKIP_IOS_MODULES="composeApp,iosApp"excludes specific modules from iOS dispatch (mirrors the existingSKIP_DESKTOP_MODULES/SKIP_ANDROID_MODULESshape). Same forSKIP_MACOS_MODULES.
The unitTestTask field stays separate — KMP modules with both jvmTest and iosSimulatorArm64Test continue to pick jvmTest for --test-type common / auto-detect, while --test-type ios opts into the explicit iOS path.
Coverage tools
kmp-test supports both Kover (Kotlin's official, KMP-native) and JaCoCo (the JVM standard). Pick one with --coverage-tool / -CoverageTool:
| Value | Behavior |
|-------|----------|
| auto (default for parallel/coverage paths, via the gradle-tasks probe) | Per-module detection — picks koverXmlReport / jacocoTestReport from the module's actual Gradle task graph. Modules with no coverage task emit [SKIP coverage] and tests still run. |
| kover | Force Kover; assumes org.jetbrains.kotlinx.kover is applied per-module (or via convention plugin). Generates koverXmlReportDesktop / koverXmlReportDebug. |
| jacoco | Force JaCoCo; assumes the jacoco plugin is applied. Generates jacocoTestReport. |
| none | Skip coverage entirely — run tests only. Useful on heterogeneous projects where coverage isn't configured everywhere. |
Heterogeneous projects (some modules with kover, some with jacoco, some with neither) are first-class — the auto mode + per-module probe will pick the right task per module and skip cleanly when none is applied. The aggregated report still works across mixed tools.
kmp-test parallel runs the resolved coverage report task (jacocoTestReport / koverXmlReport*) automatically after the test legs — the XML is generated and aggregated in one command, with no separate ./gradlew jacocoTestReport step. The standalone coverage subcommand only re-aggregates the reports a prior parallel run already wrote (it does not run gradle itself).
JaCoCo XML is enabled automatically. Gradle's built-in
jacocoTestReportleavesxml.required = falseby default, so a module using the standardjacocoplugin emits an HTML report only — whichkmp-testcan't parse, leaving it in theno_xmlbucket even though its tests ran.kmp-test paralleltherefore injects a small Gradle init-script on the coverage-report leg that forcesreports { xml.required = true }on every JaCoCo report task, so coverage works out of the box with no build change. It's a genuine no-op for Kover (already emits XML) and for projects that already enable it. Opt out with--no-coverage-xml-autofix; when opted out, a module that ran but produced HTML only surfaces acoverage_xml_disabledwarning instead of a bareno_xml.
Coverage detection is task-graph-backed. A module is classified
coveragePlugin: 'kover' | 'jacoco'from the project's actual Gradle task graph (thegradlew tasksprobe), so detection works regardless of how the plugin is applied — a per-moduleplugins {}block, abuild-logic/convention plugin, or a rootsubprojects {}/allprojects {}block. Only modules that actually expose akoverXmlReport*/jacocoTestReporttask are reported (e.g. nowinandroid reports the 13 modules that apply jacoco, not all 35). When the probe can't run (offline /--skip-probe), a static fallback scans each module's own build file andbuild-logic/convention plugins (heuristic on the convention class / filename,/Jacoco|Kover/i); a rootsubprojects {}/allprojects {}convention is only detected when the probe runs.
Heterogeneous projects (modules without tests)
Many real-world KMP/Android projects have modules that by convention contain no tests — :api interface modules, :build-logic convention plugins, parent aggregator modules, etc. kmp-test handles these automatically:
- Auto-skip (default): any module whose filesystem path has no
src/test,src/commonTest,src/jvmTest,src/desktopTest,src/androidUnitTest,src/androidInstrumentedTest,src/androidTest,src/iosTest, orsrc/nativeTestdirectory is filtered out before gradle is invoked. Each skip prints[SKIP] <module> (no test source set — pass --include-untested to override)to stderr so the "Modules found" tally stays accurate. - Explicit exclusion:
--exclude-modules "*:api,build-logic"(sh) /-ExcludeModules(ps1) accepts comma-separated globs (same syntax as--module-filter). Self-documenting in CI commands. - Opt-out:
--include-untested/-IncludeUntestedre-includes modules with no test source set (useful when a module exists but tests are still being added).
Both flags work on parallel and changed. Without them, untested modules historically caused Task 'jacocoTestReport' not found in project ':api' errors followed by misleading [OK] Full coverage report generated! with 0% coverage — a v0.5.0 fix.
JDK toolchain mismatch (auto-resolved when possible since v0.6.1)
kmp-test reads the project's required JDK from jvmToolchain(N) / JvmTarget.JVM_N / JavaVersion.VERSION_N (taking the MAX of all signals). When that differs from java -version, the resolution follows this precedence chain:
--java-home <path>(explicit CLI override) — wins over everything; skips the catalogue and the gate.gradle.propertiesorg.gradle.java.home=<path>— gradle's explicit override; bypasses the gate.- JDK catalogue auto-select (v0.6.1+) — if a system-wide JDK matching the required version is installed in a known location (
Adoptium / Zulu / Microsoft / Semeru / BellSofton Windows,/Library/Java/JavaVirtualMachines/on macOS,/usr/lib/jvm+/opt/{java,jdk}on Linux),kmp-testinjectsJAVA_HOMEand a prependedPATHinto the gradle subprocess and proceeds. Disable with--no-jdk-autoselect. --ignore-jdk-mismatch(or-IgnoreJdkMismatch) — downgrades the block to aWARNline; tests then run under the host default.- Host default
java— if none of the above resolves a matching JDK, the gate fires andkmp-testexits 3 with a per-OSJAVA_HOMEhint.
When the catalogue auto-selects, you'll see a [NOTICE] line on stderr:
[NOTICE] auto-selecting JDK 17 from C:\Program Files\Eclipse Adoptium\jdk-17.0.18.8-hotspot (Eclipse Adoptium; host default is JDK 21)When the gate fires (step 5), the human-readable error looks like:
kmp-test: JDK mismatch — project requires JDK 17 but current is JDK 23
Tests will fail with UnsupportedClassVersionError if we proceed.
Fix: set JAVA_HOME to a JDK 17 install, or install one and let
--no-jdk-autoselect off (default) pick it up. Example:
JAVA_HOME=$(/usr/libexec/java_home -v 17) kmp-test parallel
Bypass (not recommended): pass --ignore-jdk-mismatchIn --json mode, the envelope carries errors[0].code = "jdk_mismatch" plus required_jdk / current_jdk integer fields so agents can branch on the specific failure. --dry-run skips this gate entirely (since v0.6.0) — plan inspection works on misconfigured hosts.
Exit codes
| Code | Meaning |
|------|---------|
| 0 | Success — all tests passed |
| 1 | Test failure — script ran, tests failed |
| 2 | Config error — bad CLI usage (unknown subcommand, missing arg) |
| 3 | Environment error — gradlew not found in --project-root, bash/pwsh missing on PATH, JDK absent, JDK toolchain mismatch (errors[].code: jdk_mismatch — bypass with --ignore-jdk-mismatch), or another kmp-test already running on the same project root (errors[].code: lock_held — bypass with --force; stale locks from crashed runs auto-reclaim when the PID is dead, predates the host boot, or exceeds a 4 h age threshold) |
Flag reference
| Flag | Default | Description |
|------|---------|-------------|
| --project-root | $PWD | Path to the Gradle project root |
| --max-workers | 4 | Maximum parallel Gradle workers |
| --test-type <type> | (auto-detect) | common/desktop (host JVM) | androidUnit (host JVM) | androidInstrumented (device — Compose UI) | ios | macos | all. Omitted = auto-detect runs the unit leg (androidUnit or common), which skips instrumented-only modules with an instrumented_only_skipped warning. iOS / macOS pick the per-module task from the project model. See Choosing a test type and Multi-platform test dispatch |
| --coverage-tool | auto (on parallel/coverage/info) · jacoco (on changed) | auto | kover | jacoco | none. Defaults differ per subcommand — auto reads the project's Gradle task graph (catches per-module, convention, and root subprojects {} application); changed defaults to jacoco for historical compatibility |
| --coverage-modules | (all) | Comma-separated module list for coverage aggregation |
| --min-missed-lines | 0 | Fail if missed lines exceed this threshold |
| --exclude-modules | (none) | Comma-separated module globs to skip entirely (e.g. "*:api,build-logic"). See "Heterogeneous projects" above |
| --include-untested | (off) | Re-include modules with no src/*Test* directory (auto-skipped by default) |
| --ignore-jdk-mismatch | (off) | Bypass the project-vs-JAVA_HOME JDK toolchain check. Default behavior is BLOCK with exit 3 — see "JDK toolchain mismatch" above |
| --java-home <path> | (none) | Explicit JDK install to use; wins over catalogue auto-select and gradle.properties org.gradle.java.home. See "JDK toolchain mismatch" |
| --no-jdk-autoselect | (off) | Disable catalogue auto-select; fall through directly to the gate (pre-v0.6.1 behavior) |
| --no-coverage | (off) | Alias for --coverage-tool none; runs tests only without generating coverage |
| --json / --format json | (off) | Emit a single JSON object on stdout (see "Agentic usage" below). Suppresses human-readable output |
| --include-shared | (off) | Include the shared KMP project (configured via .kmp-test-runner.json sharedProject.name) in the module set. Defaults to off — the runner stays scoped to the consumer project |
| --exclude-coverage | (none) | Comma-separated module globs to skip from coverage aggregation (separate from --exclude-modules, which skips test execution too) |
| --no-coverage-xml-autofix | (off) | Disable the auto-injected init-script that forces jacoco xml.required=true. By default kmp-test enables jacoco XML so standard-jacoco modules (HTML-only out of the box) still produce parseable coverage XML. No-op for Kover and for projects that already enable XML. See "Coverage tools" above |
| --fresh-daemon | (off) | Stop existing Gradle daemons before launching — useful when memory pressure or stale config-cache entries from prior runs cause flakes. Adds ~5 s of cold-start overhead |
| --skip-tests | (off) | Skip test execution; still runs coverage aggregation if the report files already exist. Equivalent to kmp-test coverage (the coverage subcommand sets this internally) |
| --output-file <path> | coverage-full-report.md | Filename for the aggregated coverage / parallel report. The per-run-id-suffixed copy uses this as the base name; the stable mirror (last writer wins) takes the literal value. See docs/concurrency.md |
| --coverage-only | (off) | Generate only the coverage report — implies --skip-tests and skips test discovery. Faster than coverage subcommand when the gradle reports are already on disk |
| --benchmark | (off) | Run benchmark suites instead of tests. The benchmark subcommand sets this internally; pass directly to parallel only if you're composing the orchestrator |
| --benchmark-config <smoke\|main\|stress> | smoke | Benchmark profile. smoke = ~5 min/module outer timeout (single warmup + 3 measurement iters). main = ~30 min/module (full warmup + 10 iters). stress = ~60 min/module (max warmup + 20 iters). Applies to the benchmark subcommand |
| --no-configuration-cache | (off — implicit on benchmark) | Pass --no-configuration-cache to the gradle subprocess. kmp-test benchmark injects this by default (kotlinx-benchmark caches %TEMP% inside the config cache, producing silent FAIL on stale paths). Override via --gradle-args "--configuration-cache" (gradle's last-wins). Applies to all script-backed subs |
| --ignore-gradle-timeout | (off) | (benchmark only) Disable the per-task gradle watchdog entirely. Risky on suites that hang — kmp-test will wait until gradle exits on its own |
| --no-adb | (off) | Skip the ADB probe (equivalent to KMP_TEST_SKIP_ADB=1). On kmp-test android, implies --list-only and emits warnings[].code: "no_adb_implies_list_only". Applies to info / android |
| --variant / --android-variant <auto\|debug\|release\|all> | auto | Android build-variant selector — global, accepted on parallel / changed / android / benchmark (and coverage reads it per-module). auto picks Debug if its task exists, falls back to Release (handles testBuildType = "release" projects). all dispatches both variants in the same gradle invocation |
| --module-filter <regex> | (all) | Glob, comma-separated. Selects which modules to dispatch. Applies to parallel / changed / android / benchmark / describe |
| --device <serial> | (none) | (androidInstrumented only) Pin the ADB device serial. Validated against adb devices; pins ANDROID_SERIAL for AGP. Mismatched serial → errors[].code: instrumented_setup_failed (exit 3). Applies to parallel --test-type androidInstrumented / android |
| --device-task <name> | (none) | (androidInstrumented only) Force an explicit gradle task on the instrumented leg. Preempts every other resolution (project-model probe, kmpAndroidLibrary androidConnectedCheck, AGP connected{Variant}AndroidTest). Applies to parallel --test-type androidInstrumented / android |
| --auto-retry | (off) | (androidInstrumented only) Re-dispatch instrumented tasks that ran but failed at runtime. One retry per task; mutually exclusive with cascade-isolation. Surfaces parallel.legs[i].retries[]. Applies to parallel --test-type androidInstrumented / android |
| --clear-data | (off) | (androidInstrumented only) adb shell pm clear <package> between failed dispatch + retry. Implies --auto-retry to fire. Reads package from AndroidManifest.xml. Applies to parallel --test-type androidInstrumented / android |
| --capture-on-fail | (off) | (androidInstrumented only) On instrumented-test failure, capture a device screenshot + UI-hierarchy dump via adb (best-effort, forensic-only — never changes the exit code). Paths surface on errors[].screenshot_file / .ui_hierarchy_file; errors[].capture_error is set when adb can't oblige. Post-hoc: shows the device state at task-end (high value for crashes / ANRs / hangs), not the exact assertion frame — see Capture on failure. Captures sit beside the per-module log/logcat/errors artifacts. Applies to parallel --test-type androidInstrumented / android |
| --capture-dir <path> | (per-run log dir) | (androidInstrumented only) Override where --capture-on-fail artifacts are written (default: .kmp-test-runner/logs/android/<runId>/). Implies --capture-on-fail. Relative paths resolve against --project-root. Applies to parallel --test-type androidInstrumented / android |
| --flavor <name> | (none) | Android productFlavors weave for the unit (test${Cap}${Variant}UnitTest), instrumented (connected${Cap}${Variant}AndroidTest), and coverage report tasks. Flavors applied by a build-logic convention plugin are recovered from the gradle task-graph probe (not just per-module productFlavors {}). Without --flavor on a flavored project, the unit / instrumented leg falls back to the flavor-agnostic umbrella (test / connectedAndroidTest, runs every flavor) and warns flavor_defaulted_umbrella. Applies to parallel (androidUnit / androidInstrumented / coverage) / android |
| --gradle-args <string> | (none) | Escape hatch — append tokens to every gradlew invocation. Repeatable; whitespace-split. Tokens go LAST so they OVERRIDE CLI defaults via gradle's last-wins (--gradle-args "--no-parallel" wins over --parallel). Applies to parallel / changed / android / benchmark |
| --strict-timeouts | (off) | (benchmark only) Restore pre-graded exit-code behavior: any gradle timeout exits 3 even when other modules passed. Default (off) grades partial timeouts as exit 0 + warnings[].code: "partial_timeout" when at least one module passed. Use this in CI matrix cells that require hard fail on any timeout |
| --isolated | (off) | Run gradle with --project-cache-dir <tmp> so concurrent kmp-test invocations don't share configuration cache. Tier-3 isolation. Applies to parallel / changed / android / benchmark. See docs/concurrency.md |
| --isolated-cache-dir <path> | (per-run tmpdir) | Override the temp project-cache-dir location. Implies --isolated |
| --isolated-no-lock | (off) | Skip the OS-level cache-dir lockfile. Implies --isolated. Use only when lockfile contention itself is the bottleneck (rare) |
| --color <mode> | auto | always | never | auto. Controls defensive --console=plain injection into the gradle subprocess. auto injects when stdout isn't a TTY or NO_COLOR is set (POSIX). Skipped when the user already passes any --console=* via --gradle-args |
Instrumented-only flags.
--device,--device-task,--auto-retry,--clear-data,--capture-on-fail, and--capture-dirapply only to the instrumented leg (kmp-test androidorparallel --test-type androidInstrumented); they are ignored on the unit /androidUnitlegs.
Env vars (skip-list):
| Variable | Applies when | Effect |
|----------|--------------|--------|
| SKIP_DESKTOP_MODULES | --test-type common / desktop | Comma-separated short module names skipped from the desktop test pass |
| SKIP_ANDROID_MODULES | --test-type androidUnit (default) | Same shape, for Android-side dispatch |
| SKIP_IOS_MODULES | --test-type ios | Same shape, for iOS dispatch |
| SKIP_MACOS_MODULES | --test-type macos | Same shape, for macOS dispatch |
| PARENT_ONLY_MODULES | always | Comma-separated module names that are aggregator-only (skipped at discovery time) |
| NO_COLOR | always (POSIX) | Any non-empty value disables gradle ANSI output (equivalent to --color=never) |
| KMP_TEST_SKIP_ADB | info / android | Equivalent to --no-adb. On android it implies --list-only (instrumented tests require adb) |
| KMP_GRADLE_MAXBUFFER_MB | always | Max stdout/stderr captured per gradle/adb subprocess, in megabytes (default 64). Raise on machines running very verbose builds; exceeding the cap surfaces as errors[].code: "spawn_error" instead of killing the run silently |
| KMP_JUNIT_XML_MAX_MB | parallel / changed | Size cap (megabytes) for a single TEST-*.xml report before it's skipped during the test-count walk (default 32). A skipped report surfaces as warnings[].code: "junit_xml_oversized"; tests.individual_total then undercounts and that task's test_failures[] may be incomplete |
| KMP_TEST_NO_SWEEP | test subcommands | Set to 1 to disable the startup artifact-lifecycle sweep of .kmp-test-runner/ (see the cleanup config key) |
Project config — .kmp-test-runner.json
Drop a .kmp-test-runner.json at your project root to pin stable defaults instead of repeating CLI flags or relying on env vars. Resolution precedence: CLI flag > env var > project-local > user-global > built-in default. Schema:
{
"sharedProject": { "name": "shared-libs", "path": "../shared-libs" },
"defaults": { "testType": "common", "coverageTool": "kover", "excludeModules": "*:test-fakes" },
"skip": { "android": ["legacy-app"], "ios": ["bench-android"] },
"cleanup": { "auto": true, "logsTtlDays": 7 }
}All fields are optional. Unknown fields are preserved silently for forward compat. Type-mismatched fields are dropped with a [WARN] line on stderr.
cleanup controls the artifact lifecycle sweep: every test run (after acquiring the project lock) removes stale entries under .kmp-test-runner/ — orphaned cache-isolated/ gradle caches, init-scripts and *.tmp.* leftovers older than 24 h, and per-run logs/ directories older than logsTtlDays (default 7). The model/tasks cache, reports/, the lockfile, and this config file are never auto-swept. Disable with "auto": false or the KMP_TEST_NO_SWEEP=1 env var. For an explicit purge use kmp-test clean (--all adds the model cache + reports, --dry-run lists targets with sizes first).
User-global config — ~/.kmp-test/config.json
Per-machine, per-project presets for things the project-local file can't carry — machine-specific JDK paths, personal overrides, configs against repos you don't own. File path: ~/.kmp-test/config.json on Linux/macOS, %USERPROFILE%\.kmp-test\config.json on Windows. Keyed by lookup key: git remote get-url origin → rootProject.name in settings.gradle(.kts) → basename(projectRoot), first hit wins.
{
"projects": {
"https://github.com/me/my-kmp-app.git": {
"defaults": { "testType": "common", "coverageTool": "kover" },
"skip": { "android": ["legacy-app"] },
"java_home": "C:/Program Files/Zulu/zulu-21"
},
"another-project-name": {
"defaults": { "testType": "desktop" }
}
}
}Per-project preset accepts all fields the project-local file does (sharedProject, defaults, skip) plus java_home (only valid here — see below). Project-local values override user-global values when both layers carry the same key.
java_home security note. The java_home field is permitted ONLY in the user-global file. A java_home entry in a checked-in .kmp-test-runner.json is dropped and warned, since a malicious PR could otherwise redirect a teammate's spawn env without their consent.
Quick start: gitignore CLI artifacts
The CLI writes its outputs (cache, coverage reports, Android log dumps) under a single .kmp-test-runner/ subdir at your project root. Add this one line to your project .gitignore:
# kmp-test-runner local artifacts (CLI output — never commit)
.kmp-test-runner/Continuous integration
kmp-test-runner is built for non-interactive use — the --json envelope and the exit-code contract target a CI step (or an agent) just as much as the console. Two ways to wire it into a GitHub Actions job:
Via the npm CLI (least friction)
The package is public on npm, so no auth is needed; and because it's Node, the same step runs on ubuntu-latest / windows-latest / macos-latest:
jobs:
tests:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-java@v4 # the project still needs a JDK + ./gradlew
with: { distribution: temurin, java-version: 17 }
- uses: actions/setup-node@v4
with: { node-version: 20 }
- run: npx kmp-test-runner@latest parallel --jsonSwap parallel for coverage / changed / android as the job needs. The step's exit code is the gate — 0 pass, 1 test failure (or coverage-gate breach), 2 config error, 3 environment error — so the job fails or passes with no extra scripting (full table in Exit codes). --json keeps the output structured and low-token (pipe to jq in a script, or hand it to an agent); add --isolated for parallel-safe matrix / fan-out runs.
Fail the build when coverage regresses:
- run: npx kmp-test-runner@latest coverage --min-missed-lines 100 --jsonPin a version for reproducible CI.
kmp-test-runneris pre-v1 — flags can evolve between minor releases. Pin a published version (npx kmp-test-runner@<x.y.z> …, current version on npm) instead of@latestonce your pipeline is set up. The envelope contract itself is stable fromschema_version: 2.
Via the Gradle plugin
Apply the Gradle plugin and run its task (parallelTests / coverageTask / androidTests) in the job. The plugin is published to GitHub Packages, so the consuming project's settings.gradle.kts needs that repository plus GITHUB_TOKEN auth — for an external project the npm CLI above is usually less friction.
Runner requirements
- JDK +
gradlewmust be present — any Gradle / KMP / Android project (seeactions/setup-javaabove). - Instrumented tests (
--test-type androidInstrumented, orkmp-test android) need a device/emulator on the runner — e.g.reactivecircus/android-emulator-runner. Thecommon/desktop/androidUnit/coveragelegs need no device. - iOS / macOS legs require a
macos-*runner (billed at ~10× Linux minutes).
Use as an Agent Skill
kmp-test-runner ships an Agent Skill at .skills/kmp-test-runner/ conforming to the open agentskills.io standard. The skill makes the CLI auto-discoverable by Claude Code, Gemini CLI, Cursor, GitHub Copilot, OpenAI Codex, and 30+ other agentskills.io-compatible tools — agents activate the skill automatically when the user asks to run tests in a KMP or Android Gradle project.
Installation paths (pick one):
- Project-local: clone the
.skills/kmp-test-runner/directory into your project's.skills/,.agent/skills/, or.github/skills/. The agent finds it on next session. - User-global (Claude Code):
cp -r .skills/kmp-test-runner ~/.claude/skills/ - User-global (Gemini CLI / Antigravity):
android skills add kmp-test-runner(when listed ingithub.com/android/skills, currently pending).
The skill's SKILL.md documents the JSON envelope contract, exit codes, and per-workflow steps so the agent dispatches the right kmp-test subcommand (parallel, coverage, android, benchmark) and parses results correctly without trial-and-error.
Install as a Claude Code Plugin
The same skill is also packaged as a Claude Code Plugin (.claude-plugin/plugin.json at repo root). Filesystem install:
git clone https://github.com/oscardlfr/kmp-test-runner.git
claude --plugin-dir ./kmp-test-runner # session-onlyThe plugin re-uses the same .skills/kmp-test-runner/ skill content — no duplication. Marketplace listing is planned for a follow-up.
Agentic usage — token-cost rationale
kmp-test is built to be cheap to call from AI coding agents. The --json flag is the lever: it replaces verbose, multi-step Gradle orchestration with a single command and a single structured response.
Three ways an agent can run a KMP test suite
| Approach | What the agent does | What it consumes |
|----------|---------------------|------------------|
| A. Raw Gradle + report parsing | (1) Discover modules from settings.gradle.kts. (2) Build per-module :module:test task list. (3) Invoke ./gradlew :a:test :b:test ... --parallel --continue. (4) Re-invoke ./gradlew koverXmlReport (or jacoco). (5) Read each generated XML / HTML report from build/reports/. (6) Parse missed lines, failure stack frames, etc. | Tens of thousands of tokens of Gradle progress logs + multi-KB report files in context. The agent must also understand Gradle DSL, Kover/JaCoCo task names, and report XML schemas. |
| B. kmp-test default mode | (1) Run one command: kmp-test parallel. (2) Read the human-readable summary from stdout. | A few thousand tokens — the script does the orchestration and writes a compact markdown report, but progress output and the coverage report are still in the agent's context. |
| C. kmp-test --json (agentic mode) | (1) Run one command: kmp-test parallel --json. (2) JSON.parse(stdout). | A few hundred tokens — a single JSON object with tests, modules, coverage, errors. No ANSI, no markdown, no Gradle log noise. |
Side-by-side example
Default (human) output — the same summary block users see in CI logs (~1.5 KB shown, scaled down from a typical ~10–20 KB run):
Configuration:
Project: my-app
Test Type: all
Modules found: 12
[>] Running tests for 12 modules in parallel...
> Task :core-foo:test ... 8 tests completed, 0 failed, 0 skipped
> Task :core-bar:test ... 5 tests completed, 0 failed, 0 skipped
... (one block per module) ...
[OK] Full coverage report generated!
[>>] Report saved to: .kmp-test-runner/reports/coverage/latest.md
Tests: 42 total | 42 passed | 0 failed | 0 skipped
======================================================================
MODULE COVERAGE SUMMARY
======================================================================
core-foo 85.0% 12
core-bar 92.5% 4
... (one row per module) ...
TOTAL 88.0% 16
SUMMARY: 88.0% total | 16 lines missed | 3 modules at 100% | 1m 23s
BUILD SUCCESSFULAgentic (--json) output — the entire response, on one line:
{"tool":"kmp-test","subcommand":"parallel","version":"0.10.1","project_root":"/abs/path","exit_code":0,"duration_ms":83000,"tests":{"total":42,"passed":42,"failed":0,"skipped":0},"modules":["core-foo","core-bar"],"coverage":{"tool":"kover","missed_lines":16,"modules_with_kover_plugin":["core-foo","core-bar"],"modules_with_jacoco_plugin":[]},"isolated":{"enabled":false,"cache_dir":null,"kept":false,"locked":true},"skipped":[],"errors":[],"warnings":[]}That's ~300 bytes — roughly 80–200 tokens vs. tens of thousands for the 🔻 baseline (A). For an agent running tests on every iteration of a coding loop, the difference compounds quickly. The full per-tokenizer table is at the top of this README; methodology and the captured run output are in docs/token-cost-measurement.md.
Why this gap matters
Google's android CLI for agents is the canonical agentic toolbelt for Android development — it has create, describe, run, emulator, screen, layout, info, sdk, and a pluggable skills system. It does not have a test command. An agent reaching for "the official tool" to run tests has to fall back to raw ./gradlew invocations and parse multi-KB report files — exactly approach A above. kmp-test --json is the agent-friendly testing complement: same shape as android describe (single-line JSON, parseable, stable schema), focused on the test slice the official CLI doesn't cover.
What the JSON guarantees
- Always valid JSON, even if parsing the script output partially fails. Parse gaps are surfaced in the
errors[]array rather than crashing the CLI. - Stable schema:
tool,subcommand,version,project_root,exit_code,duration_ms,tests {total/passed/failed/skipped},modules[],coverage {tool, missed_lines},errors[],warnings[]. errorsvswarnings:errors[]carries fatal signals an agent must act on (code: "lock_held","jdk_mismatch", BUILD FAILED, parse gaps).warnings[]carries non-fatal signals an agent can branch on differently — e.g.code: "gradle_deprecation"(gradle exit 1 caused solely by Gradle 9+ deprecation warnings while every task passed) orcode: "instrumented_only_skipped"(the unit leg skipped a module whose only tests are instrumented — run it with--test-type androidInstrumented). The correspondingBUILD FAILEDline is not duplicated toerrors[]when paired with a deprecation notice. The full warning-code catalogue lives indocs/envelope-contract.md.- Single line on stdout — no surrounding noise, suitable for
JSON.parse()directly. - Exit code matches
exit_codefield, so an agent can branch on either.
Agentic flags
--json is the headline flag, but four agentic levers ship together so you can introspect, scope, and validate without paying full test-execution cost.
--dry-run — what would run, no spawn
kmp-test parallel --dry-run --project-root /abs/path
# kmp-test parallel — DRY RUN (no script invoked)
# Project root: /abs/path
# Subcommand: parallel
# Script: /abs/path/to/run-parallel-coverage-suite.sh
# Final argv: --project-root /abs/path
# Spawn: bash /abs/path/to/run-parallel-coverage-suite.sh --project-root /abs/pathPair with --json for a structured plan:
{"tool":"kmp-test","subcommand":"parallel","version":"0.10.1","dry_run":true,"exit_code":0,"plan":{"spawn_cmd":"bash","spawn_args":["…/run-parallel-coverage-suite.sh","--project-root","/abs"],"script_path":"…/run-parallel-coverage-suite.sh","final_args":["--project-root","/abs"],"test_filter":null},…}--dry-run still validates gradlew (so a missing wrapper still exits 3). It just stops before spawning the script.
--test-filter <pattern> — single-class or single-method scope
Cuts a multi-module suite down to one test class — or one method — without forcing the agent to bypass the CLI:
# JVM gradle tasks — gradle's --tests handles globs natively
kmp-test parallel --test-filter "*FooServiceTest"
kmp-test parallel --test-filter "com.example.FooServiceTest.shouldFooBar"
# Android instrumented — CLI resolves *Pattern* to FQN by source scan
# (the Android runner doesn't accept wildcards, so this resolution is required)
kmp-test android --test-filter "*WidgetTest*"
# Android method-level (v0.5.2): both forms accepted
kmp-test android --test-filter "com.example.WidgetTest#shouldRenderEmpty"
kmp-test android --test-filter "*WidgetTest*#shouldRenderEmpty" # wildcard + method
kmp-test android --test-filter "com.example.WidgetTest.shouldRenderEmpty" # `.method` heuristic
# Benchmark — same translation, per-platform
kmp-test benchmark --platform android --test-filter "*ScaleBenchmark*"
kmp-test benchmark --platform android --test-filter "*ScaleBenchmark*#fastPath"When the pattern contains *, the CLI walks the project sources (skipping build/, .gradle/, node_modules/, .git/) for a class <stripped> declaration and substitutes the FQN. If no match is found, the original pattern is forwarded — gradle/Android then surfaces a clear error rather than the CLI guessing.
Method-level filtering on Android. When the pattern carries a method portion (#method separator or .method heuristic — last segment lowercase implies method, classes are conventionally UpperCamelCase), the CLI splits class+method, resolves the class, and emits the canonical AGP single-arg form -Pandroid.testInstrumentationRunnerArguments.class=<FQN>#<method>. This shape is what AndroidJUnitRunner + Microbenchmark both honor — earlier class= + method= separate-args form left Microbenchmark running every method on the class. Both input forms parse to the same wire form, so kmp-test android --test-filter "com.example.WidgetTest#shouldRender" and ... --test-filter "com.example.WidgetTest.shouldRender" are equivalent. Use # if your class names happen to start with lowercase to avoid the heuristic.
Capture on failure (Android)
--capture-on-fail grabs forensic artifacts off the device when an instrumented test module fails — useful for Compose UI / Espresso failures an agent (or human) then has to triage:
kmp-test android --capture-on-fail --jsonOn each failed module it runs, best-effort, adb exec-out screencap (a PNG) and adb exec-out uiautomator dump (the view/semantics hierarchy as XML), writing them beside the existing log / logcat / errors artifacts under .kmp-test-runner/logs/android/<runId>/ (already covered by the .kmp-test-runner/ gitignore). The paths surface on the failed-module error entry:
{ "code": "module_failed", "module": "feature-home",
"log_file": "…/feature-home.log", "logcat_file": "…/feature-home_logcat.log",
"errors_file": "…/feature-home_errors.json",
"screenshot_file": "…/feature-home_screenshot.png",
"ui_hierarchy_file": "…/feature-home_ui-hierarchy.xml" }--capture-dir <path> redirects the artifacts elsewhere (and implies --capture-on-fail).
The same flags work on kmp-test parallel --test-type androidInstrumented (and the instrumented leg of --test-type all): each failed instrumented module captures into the same per-run <runId>/ tree, namespaced by module (<module>_screenshot.png / <module>_ui-hierarchy.xml), with the paths on its errors[] entry. Capture fires once per module on the final failure — after any --auto-retry — against the resolved --device (or the first connected device / emulator).
The capture is post-hoc — adb runs after the gradle task ends, so the screenshot shows the device state at task-end, not the exact frame the assertion failed on (the same way the logcat buffer dump beside it is post-hoc). That makes it most valuable for crashes, ANRs, and hangs (the error dialog is still on screen); for a clean Compose assertion failure the screen may already be torn down, but the UI-hie
