@auto-dev/core

v0.1.2

Published

2 months ago

Autonomous code improvement engine — modify → build → test → measure → gate

Downloads

0High
0Medium
0Low

pparikh10

autonomous code-improvement mcp testing metrics overnight

@auto-dev/core

Autonomous code improvement engine that runs directly inside Claude Code — no MCP server, no API key, no subprocess. Just open your project in Claude Code and tell it to improve your code.

The engine runs unattended modify → build → test → measure → gate loops, committing improvements and reverting failures automatically. It can also scaffold entire projects from a PRD.

How It Works

AutoDev ships as a set of Claude Code skills, slash commands, and hooks in the .claude/ directory. When you invoke a skill, it spawns a background subagent that runs the full improvement loop using Claude Code's native tools (Edit, Bash, Agent) — the same tools you already use. No external server. No API key. No extra process.

You: "improve test coverage"
  └─ Claude Code activates the autodev-improve skill
       └─ Spawns a background subagent (bypassPermissions)
            └─ Reads config/default.autodev.yaml
            └─ Queries history for smart directive selection
            └─ Captures baseline (build, test, lint)
            └─ Loop:
                 ├── Read target files
                 ├── Make focused improvement (Edit tool)
                 ├── Build (gate) → Test (gate) → Lint
                 ├── Pass? → git commit + log to DB
                 └── Fail? → git checkout (revert) + log to DB
            └─ Reports summary when done

Features

Runs natively in Claude Code — No MCP server, no API key, no subprocess. Uses the Claude Code session directly.
Background subagents — The loop runs as a background agent so you can keep working.
Hard gates are non-negotiable — Build must succeed, test count/pass rate must not decrease, coverage must not decrease. Gate failure = immediate revert.
5-metric evaluation — Test pass rate (0.30), coverage (0.25), lint (0.25), complexity (0.10), security (0.10).
PRD-to-project scaffolding — Parse a PRD, generate a task DAG, build iteratively with quality gates.
Pre-improvement intelligence — Scans for complexity hotspots, coverage gaps, lint issues, and security patterns before the loop begins.
History-informed directive selection — Learns from past sessions to suggest high-performing directives and avoid fragile files.
Pattern learning — Detects fragile files, winning patterns, stale metrics, and directive performance across sessions.
Live dashboard — Real-time web UI with session monitoring, metric trends, event stream, and team analytics.
Full CLI bridge — 12 CLI commands for session management, iteration logging, metrics, reports, and intelligence queries.
42 skills, 50+ slash commands, lifecycle hooks — Deep integration with Claude Code's skill system, commands, and hook lifecycle.
Budget control — Cap iterations, wall-clock hours, or USD spend.
Event-sourced sessions — Every iteration decision is recorded to SQLite for full auditability.

Quick Start

Add to any project (recommended)

npx @auto-dev/core init

This copies skills, commands, hooks, and default config into your project's .claude/ directory. One command, done. It also creates the .autodev/autodev.db SQLite database for session tracking.

Options

npx @auto-dev/core init --force          # overwrite existing files
npx @auto-dev/core init --skip-hooks     # skip hooks and helpers
npx @auto-dev/core init --skip-config    # skip config/default.autodev.yaml
npx @auto-dev/core init --reset-db       # wipe and recreate the database
npx @auto-dev/core init --target ./other # install into a different directory

If you already have a .claude/settings.json, init will merge permissions and hooks rather than overwrite.

Use in Claude Code

Open the project in Claude Code. The .claude/ directory is detected automatically — skills, commands, and hooks are live immediately.

Improve your code:

improve test coverage and reduce lint warnings

Analyze opportunities first:

analyze this codebase for improvement opportunities

Scaffold a new project from a PRD:

create a project from this PRD: <paste PRD or file path>

Check status, pause, resume, or stop:

show autodev status
pause autodev
resume autodev
stop autodev

That's it. No setup, no config, no server to start.

CLI

The CLI ships as dist/cli.js and is registered as both autodev and auto-dev bin entries.

Project Setup

autodev init [options]                           # Install skills + create DB
autodev --help                                   # Show help
autodev --version                                # Show version

Session Management

autodev session create [--directive <text>]       # Create a new session
autodev session update <id> --status <status>     # Update session status
autodev status [<session-id>]                     # Show session status (JSON)
autodev history [<session-id>] [--limit <n>]      # Show event history
autodev report [<session-id>] [--format md|json]  # Generate session report

Iteration & Metrics Logging

autodev iteration log <sid> --number <n> --verdict accepted|rejected \
  [--directive <text>] [--files <f1,f2>] \
  [--tests-before <n>] [--tests-after <n>] \
  [--lint-before <n>] [--lint-after <n>] \
  [--coverage-before <n>] [--coverage-after <n>] \
  [--commit-sha <sha>] [--error <msg>]

autodev metrics log <sid> --name <n> --value <v> --baseline <b> \
  [--delta <d>] [--unit <u>]

autodev metrics [<session-id>] [--name <metric>]  # Query metric trends

Intelligence & Patterns

autodev suggest [--limit <n>]                     # Suggest directives from history
autodev patterns learn                            # Detect patterns across sessions
autodev patterns list [--type <type>]             # List detected patterns

Pattern types: fragile-file, winning-pattern, stale-metric, directive-perf

Live Dashboard

autodev ui [--port <n>] [--open]                  # Launch dashboard (default: port 4170)

Live Dashboard

Launch with autodev ui to get a real-time web dashboard at http://localhost:4170.

Features

Session overview — All sessions with status, directive, acceptance rate, and timestamps
Iteration timeline — Live feed of accepted/rejected iterations via Server-Sent Events (SSE)
Metric trends — Track test pass rate, coverage, lint, complexity, and security over time
Team analytics — Cross-session aggregation: total iterations, overall acceptance rate, metric improvement trends, and top-performing directives
Zero dependencies — Single HTML file with inline CSS/JS, served by a zero-dep node:http server

API Endpoints

| Endpoint | Description | |----------|-------------| | GET /api/sessions | List all sessions | | GET /api/status?session=<id> | Session status detail | | GET /api/metrics?session=<id> | Metric history for a session | | GET /api/report?session=<id> | Full session report | | GET /api/analytics | Cross-session aggregate analytics | | GET /api/events | SSE stream of live events |

Agent Intelligence

AutoDev learns from past sessions to make smarter decisions in future runs.

History-Informed Directive Selection

When the improve skill starts, it queries autodev suggest to find:

High-performing directives — Directives with >60% acceptance rate across past sessions
Struggled files — Files that appeared in rejections but never in accepted iterations
Stagnant metrics — Metrics that haven't improved in 3+ sessions

Pattern Learning

Run autodev patterns learn to detect:

| Pattern Type | What it Detects | |---|---| | fragile-file | Files with 3+ rejections — avoid targeting early | | winning-pattern | Directives with 3+ acceptances — re-use these | | stale-metric | Metrics stagnant across 2+ sessions — focus here | | directive-perf | Acceptance rate stats per directive — rank by success |

Patterns are stored in the patterns table and used by the improve skill to avoid fragile files and prioritize proven directives.

Skills (42)

Skills are the primary interface. They live in .claude/skills/ and are activated automatically when Claude Code detects a matching intent.

AutoDev Core (12)

| Skill | What it does | |-------|-------------| | autodev-improve | Run the autonomous improvement loop — modify → build → test → measure → gate cycles in background. Queries history for smart directive selection. | | autodev-create | Scaffold a new project from a PRD — parse → scaffold → task DAG → iterative build. | | autodev-analyze | Scan repo for improvement opportunities — complexity hotspots, coverage gaps, lint issues, security patterns. | | autodev-status | Current session status — iteration count, acceptance rate, budget remaining. | | autodev-pause | Pause a running session (current iteration completes first). | | autodev-resume | Resume a paused session. | | autodev-stop | Permanently stop a session. | | autodev-metrics | Metric history — test pass rate, coverage, lint, complexity, security trends. | | autodev-history | Full event audit trail — every iteration decision and state change. | | autodev-report | Summary report with metric deltas and accepted changes. | | autodev-directive | Manage improvement directives — list, add, or remove targeted instructions with priority (1–10). | | autodev-config | View or set configuration values. |

Additional Categories (30)

| Category | Count | Skills | |----------|-------|--------| | AgentDB | 5 | agentdb-vector-search, agentdb-memory-patterns, agentdb-learning, agentdb-optimization, agentdb-advanced | | GitHub | 5 | github-code-review, github-multi-repo, github-project-management, github-release-management, github-workflow-automation | | Swarm | 3 | swarm-orchestration, swarm-advanced, sparc-methodology | | Intelligence | 2 | reasoningbank-intelligence, reasoningbank-agentdb | | V3 Platform | 9 | v3-ddd-architecture, v3-core-implementation, v3-security-overhaul, v3-memory-unification, v3-performance-optimization, v3-mcp-optimization, v3-cli-modernization, v3-integration-deep, v3-swarm-coordination | | Dev Workflow | 6 | hooks-automation, pair-programming, skill-builder, stream-chain, browser, verification-quality |

See .claude/skills/README.md for the full reference with parameter details.

Slash Commands

Slash commands live in .claude/commands/ and are available in Claude Code via /command-name.

| Category | Commands | |----------|----------| | Analysis | bottleneck-detect, performance-bottlenecks, performance-report, token-efficiency, token-usage | | Automation | auto-agent, self-healing, session-memory, smart-agents, smart-spawn, workflow-select | | Monitoring | agent-metrics, agents, real-time-view, status, swarm-monitor | | Optimization | auto-topology, cache-manage, parallel-execute, topology-optimize | | GitHub | GitHub integration commands | | Hooks | Hook management commands | | SPARC | SPARC methodology commands |

Hooks

The .claude/settings.json configures lifecycle hooks that fire automatically during Claude Code sessions:

| Hook | When it fires | What it does | |------|---------------|-------------| | PreToolUse (Bash) | Before any Bash command | Risk assessment via hook-handler.cjs | | PreToolUse (Write/Edit) | Before file edits | Context and agent suggestions | | PostToolUse (Write/Edit) | After file edits | Learning and pattern recording | | PostToolUse (Bash) | After Bash commands | Outcome recording | | UserPromptSubmit | On every user message | Task routing and intent detection | | SessionStart | Session begins | Restore previous session state, import memory | | SessionEnd | Session ends | Persist state | | SubagentStart | Subagent spawns | Status tracking | | SubagentStop | Subagent completes | Post-task learning | | PreCompact | Before context compaction | Session state preservation | | Stop | Agent stops | Memory sync |

Architecture

The Improvement Loop

1. Read config/default.autodev.yaml
2. Query history: autodev suggest → ranked directives
3. Run pattern learning: autodev patterns learn → fragile files, stale metrics
4. Capture baseline (build, test, lint)
5. For each iteration:
   a. Select directive (history-suggested, user-provided, or auto-detected)
   b. Read target files (avoid fragile files early)
   c. Make focused improvement (Claude Code Edit tool)
   d. Build → GATE: must succeed
   e. Test  → GATE: count and pass rate must not decrease
   f. Lint  → record new counts
   g. Pass all gates? → git add <files> && git commit + log to DB
      Fail any gate?  → git checkout -- <files> (revert) + log to DB
6. Stop when: budget exhausted, max iterations, or 5 consecutive rejects
7. Output session summary + update DB

Hard Gates (non-negotiable)

| Gate | Rule | |------|------| | Build | Must succeed | | Test count | Must not decrease | | Test pass rate | Must not decrease | | Coverage | Must not decrease |

A single gate failure triggers an immediate revert — no exceptions.

Composite Scoring

| Metric | Weight | |--------|--------| | Test Pass Rate | 0.30 | | Coverage | 0.25 | | Lint Score | 0.25 | | Complexity | 0.10 | | Security | 0.10 |

Module Map (src/)

The src/ directory contains the TypeScript engine that powers the skills:

| Module | Role | |--------|------| | src/engine/ | ImproveCoordinator, IterationRunner, BudgetController, DirectiveParser | | src/eval/ | EvaluationPipeline — runs collectors, hard gates, weighted scorer | | src/eval/collectors/ | 5 metric collectors: test pass rate, coverage, lint, complexity, security | | src/llm/ | Multi-provider LLM clients, context assembler, changeset parser | | src/git/ | Git sandbox — in-place (GitSandboxImpl) and worktree (WorktreeSandboxImpl) | | src/persistence/ | SQLite (WAL mode) — event store, metric store, session manager, pattern store | | src/create/ | PRD parser, task decomposer, project scaffolder | | src/intelligence/ | History analyzer, pattern learner, complexity/coverage/lint analysis, opportunity ranking | | src/cli/ | CLI command implementations and DB helpers | | src/ui/ | Live dashboard HTTP server + single-file HTML frontend | | src/integration/ | Claude Flow memory bridge — optional pattern storage | | src/types/ | Shared TypeScript interfaces |

Data Storage

SQLite database at .autodev/autodev.db (WAL mode, foreign keys enabled):

| Table | Purpose | |-------|---------| | sessions | Session config, status, aggregate stats | | events | Event-sourced iteration decisions (accepted/rejected/error) | | iterations | Per-iteration metadata | | metrics | Metric snapshots (name, value, baseline, delta, unit) | | patterns | Learned patterns (fragile files, winning directives, stale metrics) |

Configuration

Default config at config/default.autodev.yaml:

llm:
  provider: claude-code          # runs inside Claude Code — no API key needed
  model: claude-opus-4-6
  max_tokens_per_iteration: 16000
  temperature: 0.3

improve:
  max_iterations: 200
  max_hours: 8
  max_cost_usd: 50.00
  strategy: balanced             # balanced | aggressive | conservative

build:
  command: npm run build
  timeout_seconds: 120

test:
  command: npm test
  timeout_seconds: 180

lint:
  command: npx eslint . --format json
  timeout_seconds: 60

metrics:
  hard_gates:
    build_succeeds: true
    tests_pass_not_decreased: true
    test_count_not_decreased: true
    coverage_not_decreased: true
  weights:
    test_pass_rate: 0.30
    coverage: 0.25
    lint: 0.25
    complexity: 0.10
    security: 0.10

Development

Prerequisites

Node.js ≥ 18
pnpm

Build & Test

pnpm install
pnpm run build                                       # tsup → dist/ (ESM + CJS)
pnpm test                                            # vitest
pnpm test -- --run tests/unit/hard-gates.test.ts     # single file
pnpm run test:watch                                  # watch mode
pnpm run typecheck                                   # tsc --noEmit
pnpm run lint                                        # eslint
pnpm run lint:fix                                    # eslint with auto-fix
pnpm run format                                      # prettier

Project Structure

.claude/
├── skills/                    # 42 Claude Code skills (primary interface)
├── commands/                  # 50+ slash commands
├── agents/                    # Agent configurations
├── helpers/                   # Hook handlers, statusline, utilities
└── settings.json              # Hooks, permissions, environment config

config/
└── default.autodev.yaml       # Build/test/lint commands, budget limits, metric weights

src/
├── cli.ts                     # CLI entry point (init, ui, session, suggest, patterns, …)
├── cli/
│   ├── commands.ts            # CLI command implementations (12 subcommands)
│   └── db.ts                  # Shared DB helpers
├── ui/
│   ├── server.ts              # Live dashboard HTTP server + SSE + API routes
│   └── dashboard.html         # Single-file dark-themed dashboard
├── engine/                    # Core improvement loop
├── eval/                      # Evaluation pipeline + 5 collectors
├── create/                    # PRD → project scaffolding
├── git/                       # Git sandbox (in-place & worktree)
├── llm/                       # LLM client implementations
├── intelligence/              # History analyzer, pattern learner, pre-improvement analysis
├── persistence/               # SQLite event sourcing + pattern store
├── integration/               # Claude Flow memory bridge
├── types/                     # Shared TypeScript interfaces
└── index.ts                   # Library entry (types only)

tests/
├── unit/                      # 14 unit test suites
└── integration/               # End-to-end tests

Design Principles

Native Claude Code execution — No external server, no API key, no subprocess. Skills spawn subagents that use Claude Code's own tools.
Hard gates are non-negotiable — Build must succeed, tests must not regress, coverage must not decrease.
Learn from history — Past session data informs directive selection, avoids fragile files, and prioritizes proven patterns.
Small, focused changes — One concern per iteration. Each change is validated independently.
Immediate revert on failure — Any gate failure triggers git checkout on changed files.
Event sourcing — All decisions are recorded to SQLite for auditability.
Dependency injection — All coordinators accept typed interfaces, making the engine testable and extensible.

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@auto-dev/core

How It Works

Features

Quick Start

Add to any project (recommended)

Options

Use in Claude Code

CLI

Project Setup

Session Management

Iteration & Metrics Logging

Intelligence & Patterns

Live Dashboard

Live Dashboard

Features

API Endpoints

Agent Intelligence

History-Informed Directive Selection

Pattern Learning

Skills (42)

AutoDev Core (12)

Additional Categories (30)

Slash Commands

Hooks

Architecture

The Improvement Loop

Hard Gates (non-negotiable)

Composite Scoring

Module Map (src/)

Data Storage

Configuration

Development

Prerequisites

Build & Test

Project Structure

Design Principles

License