lacuna-cli

v0.3.4

Published

4 days ago

Agentic CLI that finds coverage gaps, generates tests, and verifies them in a loop

0High
0Medium
0Low

simon-ugorji

testing coverage ai agent cli

lacuna

Find untested code, write tests for it, and verify they pass, in one command.

Lacuna is a command-line tool that reads your code, finds the parts your tests don't cover, and writes tests to fill the gaps. It runs every test it writes and retries the ones that fail, so what lands in your repo actually passes.

It works with any OpenAI-compatible model (including local ones via Ollama or LM Studio), so you can run it without sending code to a hosted provider if you'd rather not.

$ lacuna generate

Getting started

1. Install

$ npm install -g lacuna-cli

Requires Node 20 or newer.

2. Set an API key

Lacuna defaults to DeepSeek. Create a key at platform.deepseek.com and export it:

$ export DEEPSEEK_API_KEY=sk-...

Prefer a different model? See Models; every option, including free local ones, is listed there. You can pick one during lacuna init.

3. Configure your project

From your project root:

$ lacuna init

This is an interactive wizard. It detects your test runner, asks which model to use, and writes a .lacuna.json. For React, React Native, and Next.js projects it also installs the testing libraries and creates a working test config and setup file.

4. See what's untested

$ lacuna analyze

Read-only. It runs your suite, collects coverage, and lists the files and functions below your threshold. Nothing is written.

5. Generate the tests

$ lacuna generate

Lacuna writes tests for the gaps, runs them, and retries failures. When it finishes, the new tests are already passing.

To target a single file and skip the full coverage run:

$ lacuna generate --file src/utils/math.ts

That's the whole loop. The rest of this README is reference.

How it works

lacuna generate                              lacuna fix
  │                                            │
  ├─ 1. Collect coverage                       ├─ 1. Find failing files
  │    ├─ report < 10 min old → reuse it       │    ├─ --file → that file only
  │    └─ otherwise → run the suite            │    ├─ cache < 30 min old → reuse it
  ├─ 2. Find files below threshold             │    └─ otherwise → run the suite
  │                                            │
  └─ For each gap:                             └─ For each failing file:
       ├─ Read source + existing tests              ├─ Run it alone, capture the error
       ├─ Extract used symbol definitions           ├─ Read the test + source + types
       │  (return shapes, method signatures)        ├─ Read tsconfig paths, deps, setup
       ├─ Read tsconfig paths, deps, setup          ├─ Model writes a surgical fix
       ├─ Send full context to the model            ├─ Pass → next file
       ├─ Run the generated tests                   └─ Fail → record it, detect loops,
       ├─ Pass → next file                                   retry, restore on giving up
       └─ Fail → retry with the error,
                 keep the best attempt

Two rules hold throughout: lacuna never leaves a half-written file behind, and it never removes passing tests. If it can't fully fix a file, it keeps the attempt with the most passing tests — and if nothing beat the starting point, it puts the original back.

Coverage is a guide, not a mandate. lacuna won't manufacture junk tests to turn a red line green — no type-impossible inputs (null as any on a non-nullable prop), no assertions that contradict the test's own title, no tests that lock in an incidental quirk. When you target an already-tested file, the only lines left uncovered are usually defensive/edge branches that aren't meaningfully testable; lacuna leaves those uncovered and tells you so, rather than padding the suite. Each accepted file is then run through your project's own eslint --fix + prettier so it matches your style.

Commands

`lacuna init`

Sets up lacuna in your project. Detects the test runner, picks a model, and writes .lacuna.json. Run it from anywhere in the project; it finds the root on its own.

For React, it installs @testing-library/react, jest-dom, user-event, and jsdom, then writes a vitest.config.ts and setup file with mock cleanup hooks.

For Next.js it does the same but skips the jsdom environment (Next manages its own), adds your @/ alias, and pre-mocks next/navigation, next/headers, next/cache, next/image, and next/font.

`lacuna analyze`

Runs the suite, collects coverage, and reports what's below threshold. Writes nothing.

lacuna analyze
lacuna analyze @diff:origin/main    # patch coverage of the lines your branch changed
lacuna analyze @diff packages/api   # ...scoped to one directory (monorepo package)
lacuna analyze --threshold 90
lacuna analyze --format json --output report.json
lacuna analyze --format markdown

`lacuna generate`

The main command: find gaps, write tests, run them, retry failures.

lacuna generate
lacuna generate --file src/utils/math.ts   # one file, skips the coverage run
lacuna generate @diff:origin/main           # patch coverage: only the lines your branch changed
lacuna generate --dry-run                   # preview, write nothing
lacuna generate --verbose                   # live panel as the model writes
lacuna generate --workers 4                  # process 4 files in parallel
lacuna generate --fresh                      # ignore the cached coverage report
lacuna generate --format json --output report.json

If you ran analyze in the last 10 minutes, generate reuses that report instead of running the suite again (--fresh forces a new run). When retries are exhausted, lacuna keeps the best attempt only if it adds passing tests and points you to lacuna fix for the rest; otherwise it restores the original. If the model produces the same output twice, the loop stops early instead of wasting iterations.

Patch coverage (`@diff`) — close a Codecov gap on a PR

Codecov (and similar gates) judge patch coverage: the coverage of only the lines your PR changed, not the whole repo. A file can sit at 94% overall and still fail the gate because the four lines you just added aren't tested. lacuna generate @diff targets exactly that scope — the same lines Codecov flags — so a green lacuna run predicts a green patch check.

lacuna generate @diff                       # diff vs the repo's default branch (origin/HEAD → main/master)
lacuna generate @diff:origin/main           # explicit base ref
lacuna generate @diff packages/api          # narrow to the changed lines inside ONE directory (monorepo package)
lacuna generate @diff -f src/lib/Service.ts # narrow to ONE changed file's uncovered lines
lacuna analyze  @diff:origin/main           # read-only: report patch coverage + the gap, write nothing

The workflow (fast + accurate):

# 1. Produce a FULL coverage report once (or reuse the lcov your CI already uploaded to Codecov).
npm run test:cov                       # writes coverage/lcov.info

# 2. Generate tests for just the changed-and-uncovered lines. lacuna reuses the report from step 1
#    instantly — no suite re-run — and writes tests scoped to the exact gap.
lacuna generate @diff:origin/main

# 3. Commit.
git add -A && git commit -m "test: cover patch"

Why step 1 matters: patch coverage is only meaningful against the same measurement Codecov used — your whole suite. A line can be covered by a test in a different file (an integration or DI test), so lacuna must read a full-suite report to know what's genuinely uncovered. It therefore reuses an existing coverage/lcov.info regardless of age rather than running a narrower, misleading subset. If none exists it runs the full suite (accurate but slow) and warns you; --fresh forces a full re-run. The after-number is measured cheaply — just the new test's incremental coverage, unioned onto the report, no second full run.

In CI — gate the PR on patch coverage without waiting on Codecov's round-trip:

- run: npm run test:cov                        # your normal coverage step; leaves coverage/lcov.info
- run: npx lacuna generate @diff:origin/main   # reads that lcov, covers the gap; exit 1 if still below threshold
- run: git diff --exit-code || (git add -A && git commit -m "test: cover patch" && git push)

How it decides what to target: it diffs from the git merge-base with the base ref (exactly Codecov's patch semantics — only what your branch added since it forked), intersects those changed lines with the uncovered lines in the coverage report, and generates tests for just that intersection. The report gains a Patch coverage before/after line and the exit code gates on it (below threshold → 1).

Edge cases: a docs-only diff exits 0 ("nothing to cover"); an unresolvable base (e.g. a shallow CI clone) exits 2 with a git fetch --unshallow hint; a changed file whose tests never ran counts as fully uncovered. Note: lacuna currently parses line coverage (DA) but not branch coverage (BRDA), so a half-covered conditional Codecov shows as a yellow n/m branch isn't targeted yet — full line misses are.

`lacuna fix`

Finds failing tests and repairs them. Each failing file goes to the model with its error output and source; the model patches what's broken and lacuna reruns until it passes. A fix that makes the tests pass is kept even if minor type warnings remain. fix never reverts a working change — and when it can't reach all-green, it keeps the attempt with the most passing tests rather than discarding a partial improvement.

lacuna fix
lacuna fix --file src/utils/math.test.ts    # one file, skips the full suite
lacuna fix --workers 4                       # 4 files in parallel
lacuna fix --types                           # repair files that pass but fail type-checking
lacuna fix --dry-run
lacuna fix --verbose
lacuna fix --fresh
lacuna fix --no-regenerate-on-failure        # don't fall back to regenerating
lacuna fix --fix-polluters                   # handle tests that pass alone but fail in the suite

A few behaviors worth knowing:

Regeneration fallback (on by default). If repair is exhausted on a genuinely broken file (one with no passing tests to lose), lacuna deletes it and regenerates from source, since a clean start beats more patching. A file that already has passing tests is never deleted, and a regeneration that would lower the passing count is discarded. Turn it off with --no-regenerate-on-failure.
Type errors (--types). Selects files by TypeScript errors instead of test failures, finding every test file that fails type-checking even if its tests pass. Type-checking runs against each file's governing tsconfig (the nearest one walking up), not the repo root — so in a monorepo a package's @/ path aliases, jsx, and moduleResolution resolve correctly and a clean file isn't flagged with false Cannot find module/Cannot use JSX errors. It also respects that config's rules: if the nearest one disables noImplicitAny (common in monorepo packages), implicit-any isn't treated as an error. Files are grouped by config and checked one scoped tsc run per package.
Polluters (--fix-polluters). For tests that pass alone but fail in the full suite, lacuna bisects the suite to find the file leaking state and fixes it; if none can be isolated, it regenerates the affected test.

Without --file, the failing-files list is cached for 30 minutes and trimmed to whatever's still failing after each run, so re-running picks up where you left off.

`lacuna run`

Runs your suite and reports coverage. No model involved.

lacuna run

Configuration

lacuna init writes .lacuna.json. Every field is optional and has a sensible default.

The file includes a $schema line, so editors like VS Code give you key completion and inline docs as you type. To add it to an existing config, put this first:

{
  "$schema": "https://raw.githubusercontent.com/Octagon-simon/lacuna/main/lacuna.schema.json"
}

A typical config:

{
  "$schema": "https://raw.githubusercontent.com/Octagon-simon/lacuna/main/lacuna.schema.json",
  "provider": "openai-compatible",
  "model": "deepseek-chat",
  "baseURL": "https://api.deepseek.com/v1",
  "apiKeyEnv": "DEEPSEEK_API_KEY",
  "testRunner": "jest",
  "sourceDir": "src",
  "threshold": 80,
  "mocksFile": "src/test/mocks.ts",
  "setupFile": "src/test/setup.ts",
  "ignore": ["src/graphql/", "src/theme/"]
}

| Field | Default | Description | |---|---|---| | provider | openai-compatible | anthropic or openai-compatible | | model | deepseek-chat | Model name | | apiKeyEnv | DEEPSEEK_API_KEY | Env var holding your API key | | baseURL | https://api.deepseek.com/v1 | API base URL (required for openai-compatible) | | testRunner | auto | jest, vitest, pytest, mocha, go-test, and more | | coverageFormat | lcov | lcov, json-summary, or cobertura | | coverageDir | coverage | Where your runner writes coverage | | sourceDir | src | Directory to scan. A string, or an array like ["src", "lib"] | | threshold | 80 | Minimum line coverage % to pass | | maxIterations | 3 | Retries per failing test before giving up | | coverageTimeout | 300 | Seconds before the suite is killed (guards against hung handles) | | mocksFile | (none) | Shared mock file every generated test imports from (see Shared mocks) | | setupFile | (none) | Your test setup file; its contents are shown to the model so it knows what's already available | | ignore | [] | Path substrings to skip, e.g. "src/graphql/" | | maxTokens | 16000 | Max output tokens per call. Lower for strict providers (Groq free tier ~8000); raise if large files are cut off | | format | true | Run your project's local eslint --fix + prettier on each generated/fixed test so it matches your repo style and clears lint. Best-effort; set false to disable | | nodeEnvRouting | true | When a generated test is DOM-free (services, utils, validators), add a @vitest-environment node / @jest-environment node docblock so it skips jsdom startup and runs much faster. Verified per file and reverted if it breaks the test; set false to disable | | debug | false | Log every prompt and response (see Debugging) |

Models

Lacuna works with any model behind an OpenAI-compatible API, plus Anthropic directly. Switch any time by re-running lacuna init or editing .lacuna.json.

| Preset | Model | API key | Notes | |---|---|---|---| | DeepSeek (default) | deepseek-chat | DEEPSEEK_API_KEY | Fast and cheap; a good default | | DeepSeek R1 | deepseek-reasoner | DEEPSEEK_API_KEY | Reasoning model | | Claude Sonnet | claude-sonnet-4-6 | ANTHROPIC_API_KEY | High quality | | Claude Opus | claude-opus-4-7 | ANTHROPIC_API_KEY | Most capable | | GPT-4o | gpt-4o | OPENAI_API_KEY | | | Groq | llama-3.3-70b-versatile | GROQ_API_KEY | Fast, free tier | | Gemini 2.5 Pro | gemini-2.5-pro | GEMINI_API_KEY | | | Gemini 2.5 Flash | gemini-2.5-flash | GEMINI_API_KEY | Faster, cheaper | | OpenRouter | any | OPENROUTER_API_KEY | One key, many models | | Ollama | any local | none | Runs fully on your machine | | LM Studio | any local | none | Runs fully on your machine | | Custom | any | configurable | Any OpenAI-compatible endpoint |

Supported stacks

Lacuna can run the suite and collect coverage for a wide range of languages. The quality of the generated tests depends on how much prompt tuning a stack has had.

Tuned and tested:

| Stack | Runner | Focus | |---|---|---| | TypeScript / JavaScript | Vitest, Jest | Hook return shapes, service method signatures, type-safe mocks, vi.mocked()/jest.mocked(), factory hoisting | | React | Vitest, Jest | RTL queries, act() async rules, loading states, mock lifecycle, findBy over waitFor | | React Native / Expo | Jest (jest-expo) | RNTL v14 async contract, infra mocks (Reanimated, AsyncStorage, vector icons), mock-shape accuracy, query isolation | | Next.js | Vitest | Server/client boundaries, next/navigation, next/headers, next/cache, server actions, directive detection |

Runner support, lighter tuning: Vue (Vitest), Python (pytest), PHP (PHPUnit, Pest). These run and collect coverage, but framework-specific prompt tuning is still in progress.

Runner only: Go, Ruby (RSpec), Rust (cargo), C# (dotnet), Java (Gradle/Maven), Swift. Suites run and coverage is collected, but test generation isn't tuned for them yet.

Shared mocks

In a large codebase, redefining the same mocks in every test file gets painful fast. Point lacuna at a single mock file and every generated test imports from it.

Create the file:

// src/test/mocks.ts
import { vi } from 'vitest'

export const mockNavigate = vi.fn()
vi.mock('react-router-dom', () => ({
  useNavigate: () => mockNavigate,
  useParams: vi.fn(() => ({})),
}))

export const mockUser = { id: 'user-1', email: '[email protected]', role: 'admin' }
export const mockUseAuth = vi.fn(() => ({ user: mockUser, isLoading: false }))

beforeEach(() => vi.clearAllMocks())

Reference it in .lacuna.json:

{ "mocksFile": "src/test/mocks.ts" }

Now generated tests import from that file instead of inventing their own mocks. If a test needs a mock that doesn't exist yet, lacuna adds it to the shared file and imports it.

Under the hood, lacuna parses the mock file before each run and builds an inventory of every vi.mock() call and its exports, so the model knows what's already mocked and edits it surgically instead of duplicating it. When a mock needs changing, the model patches the existing block rather than rewriting the file.

CI / GitHub Actions

Run lacuna on pull requests to generate missing tests and block merges below threshold.

.github/workflows/lacuna.yml:

name: lacuna coverage

on:
  pull_request:
    branches: [main]

jobs:
  coverage:
    runs-on: ubuntu-latest
    permissions:
      contents: write
      pull-requests: write
    steps:
      - uses: actions/checkout@v4
        with:
          ref: ${{ github.head_ref }}
          token: ${{ secrets.GITHUB_TOKEN }}

      - uses: actions/setup-node@v4
        with:
          node-version: '20'
          cache: npm
      - run: npm ci

      - name: Run lacuna
        id: lacuna
        uses: Octagon-simon/lacuna@v1
        continue-on-error: true        # let the commit step run even if coverage is low
        with:
          threshold: 80
          workers: 2
          model: deepseek
          deepseek-api-key: ${{ secrets.DEEPSEEK_API_KEY }}

      - name: Commit generated tests
        if: steps.lacuna.outcome != 'cancelled'
        run: |
          git config user.name "lacuna[bot]"
          git config user.email "lacuna[bot]@users.noreply.github.com"
          git add -A
          git diff --staged --quiet || git commit -m "chore: add lacuna-generated tests"
          git push

On each PR, lacuna generates the missing tests, posts a coverage report as a comment (updated in place, not re-posted), and fails the check if coverage stays below threshold.

To use a different model, pass its preset and key:

with:
  model: gpt-4o
  openai-api-key: ${{ secrets.OPENAI_API_KEY }}

Gating on Codecov patch coverage

The workflow above covers the whole repo to a threshold. If your gate is a Codecov patch check (coverage of only the lines the PR changed), use @diff instead — it targets exactly those lines and is far cheaper because it reuses the coverage report your test step already produced. See Patch coverage (@diff) for the full workflow. Minimal step, after your coverage step has written coverage/lcov.info:

      - run: npm run test:cov                        # your coverage step → coverage/lcov.info
      - run: npx lacuna generate @diff:origin/main   # cover the changed-and-uncovered lines
      - run: |
          git add -A
          git diff --staged --quiet || (git commit -m "test: cover patch" && git push)

Fetch enough history for the merge-base first (actions/checkout with fetch-depth: 0, or git fetch --unshallow), otherwise @diff can't resolve the base ref and exits 2.

Debugging

When a run behaves oddly (bad mock shapes, patches that won't apply, failures you can't reproduce), turn on debug logging to see exactly what the model received and returned.

Per run:

LACUNA_DEBUG=1 lacuna generate --file src/payments/processor.ts

Or persist it in .lacuna.json:

{ "debug": true }

Lacuna writes one log per target file, named after its path: src/queue/processor.ts becomes lacuna-debug.src_queue_processor.txt (a file's generate and fix share the log). The full path is used, not just the file name, so identically-named files like send-email/route.ts and login/route.ts get separate logs instead of overwriting each other. Each log is cleared when that file's run starts and appended through its retries, so parallel runs never clobber each other. The env var wins over the config value, so you can override per run without editing anything.

Filing a bug? Attach the debug file; it has the exact prompt and raw response, which is what makes an issue reproducible.

Reference

Output formats

Every command takes --format and --output:

lacuna analyze                                   # terminal (default)
lacuna analyze --format json                     # for scripts and CI
lacuna analyze --format markdown                 # for PR comments
lacuna generate --format json --output report.json

Exit codes

| Code | Meaning | |---|---| | 0 | Coverage meets threshold | | 1 | Coverage below threshold, or some files couldn't be tested | | 2 | Error: runner failed, bad config, or no tests generated |

Test placement

Lacuna follows your existing layout. If tests sit next to source files, new tests go there too. If they live in a separate tree (test/, tests/, test/unit/, …) that actually contains tests, it mirrors that. Otherwise it uses a __tests__/ folder beside the source, creating it if needed.

What gets skipped

Files with no testable logic are skipped automatically:

By directory: types/, constants/, assets/, images/, icons/, fonts/, styles/, generated/, __generated__/, mocks/, fixtures/, migrations/, i18n/, locales/, translations/
By filename: *.d.ts, *.test.*, *.spec.*, *.stories.*, *.config.*, *.mock.*, *.types.ts, *.constants.ts, *.enum.*, index.*
By content: any file that exports only types, interfaces, enums, or constants

Add your own with ignore in .lacuna.json. Entries match as path substrings.

Project structure

lacuna/
├── src/
│   ├── commands/          # CLI commands: analyze, generate, fix, run, init
│   ├── agent/
│   │   ├── loop.ts        # generate → run → retry loop
│   │   ├── fix-loop.ts    # fix → run → retry loop
│   │   ├── context.ts     # builds model context (source, tests, mocks, types)
│   │   ├── generator.ts   # calls the model, manages conversation history
│   │   └── prompts/       # prompt builders, split by framework and runner
│   ├── lib/
│   │   ├── config.ts      # config loader + zod schema
│   │   ├── detector.ts    # detects test runner and language
│   │   ├── runner.ts      # spawns test commands, captures output
│   │   ├── reporter.ts    # terminal / JSON / markdown output
│   │   ├── validate.ts    # patch application, regression + broken-import detection
│   │   ├── typecheck.ts   # tsc pass and type-error scoping
│   │   ├── providers/     # model provider abstraction (anthropic, openai-compatible)
│   │   └── coverage/      # lcov / json parsers, gap extraction
│   └── ci/                # PR comment + GitHub Actions outputs
├── action.yml             # GitHub Action definition
└── .github/workflows/     # example workflow + release pipeline

Contributing

Issues and PRs are welcome. The codebase is TypeScript throughout.

git clone https://github.com/Octagon-simon/lacuna
cd lacuna
npm install
npm run build
npm link        # makes `lacuna` point at your local build

When reporting a bug, the bug-report template asks for your test runner, model, lacuna version, and terminal output, the things needed to reproduce it.

License

MIT