@auto-dev/core
v0.1.2
Published
Autonomous code improvement engine — modify → build → test → measure → gate
Maintainers
Readme
@auto-dev/core
Autonomous code improvement engine that runs directly inside Claude Code — no MCP server, no API key, no subprocess. Just open your project in Claude Code and tell it to improve your code.
The engine runs unattended modify → build → test → measure → gate loops, committing improvements and reverting failures automatically. It can also scaffold entire projects from a PRD.
How It Works
AutoDev ships as a set of Claude Code skills, slash commands, and hooks in the .claude/ directory. When you invoke a skill, it spawns a background subagent that runs the full improvement loop using Claude Code's native tools (Edit, Bash, Agent) — the same tools you already use. No external server. No API key. No extra process.
You: "improve test coverage"
└─ Claude Code activates the autodev-improve skill
└─ Spawns a background subagent (bypassPermissions)
└─ Reads config/default.autodev.yaml
└─ Queries history for smart directive selection
└─ Captures baseline (build, test, lint)
└─ Loop:
├── Read target files
├── Make focused improvement (Edit tool)
├── Build (gate) → Test (gate) → Lint
├── Pass? → git commit + log to DB
└── Fail? → git checkout (revert) + log to DB
└─ Reports summary when doneFeatures
- Runs natively in Claude Code — No MCP server, no API key, no subprocess. Uses the Claude Code session directly.
- Background subagents — The loop runs as a background agent so you can keep working.
- Hard gates are non-negotiable — Build must succeed, test count/pass rate must not decrease, coverage must not decrease. Gate failure = immediate revert.
- 5-metric evaluation — Test pass rate (0.30), coverage (0.25), lint (0.25), complexity (0.10), security (0.10).
- PRD-to-project scaffolding — Parse a PRD, generate a task DAG, build iteratively with quality gates.
- Pre-improvement intelligence — Scans for complexity hotspots, coverage gaps, lint issues, and security patterns before the loop begins.
- History-informed directive selection — Learns from past sessions to suggest high-performing directives and avoid fragile files.
- Pattern learning — Detects fragile files, winning patterns, stale metrics, and directive performance across sessions.
- Live dashboard — Real-time web UI with session monitoring, metric trends, event stream, and team analytics.
- Full CLI bridge — 12 CLI commands for session management, iteration logging, metrics, reports, and intelligence queries.
- 42 skills, 50+ slash commands, lifecycle hooks — Deep integration with Claude Code's skill system, commands, and hook lifecycle.
- Budget control — Cap iterations, wall-clock hours, or USD spend.
- Event-sourced sessions — Every iteration decision is recorded to SQLite for full auditability.
Quick Start
Add to any project (recommended)
npx @auto-dev/core initThis copies skills, commands, hooks, and default config into your project's .claude/ directory. One command, done. It also creates the .autodev/autodev.db SQLite database for session tracking.
Options
npx @auto-dev/core init --force # overwrite existing files
npx @auto-dev/core init --skip-hooks # skip hooks and helpers
npx @auto-dev/core init --skip-config # skip config/default.autodev.yaml
npx @auto-dev/core init --reset-db # wipe and recreate the database
npx @auto-dev/core init --target ./other # install into a different directoryIf you already have a .claude/settings.json, init will merge permissions and hooks rather than overwrite.
Use in Claude Code
Open the project in Claude Code. The .claude/ directory is detected automatically — skills, commands, and hooks are live immediately.
Improve your code:
improve test coverage and reduce lint warningsAnalyze opportunities first:
analyze this codebase for improvement opportunitiesScaffold a new project from a PRD:
create a project from this PRD: <paste PRD or file path>Check status, pause, resume, or stop:
show autodev status
pause autodev
resume autodev
stop autodevThat's it. No setup, no config, no server to start.
CLI
The CLI ships as dist/cli.js and is registered as both autodev and auto-dev bin entries.
Project Setup
autodev init [options] # Install skills + create DB
autodev --help # Show help
autodev --version # Show versionSession Management
autodev session create [--directive <text>] # Create a new session
autodev session update <id> --status <status> # Update session status
autodev status [<session-id>] # Show session status (JSON)
autodev history [<session-id>] [--limit <n>] # Show event history
autodev report [<session-id>] [--format md|json] # Generate session reportIteration & Metrics Logging
autodev iteration log <sid> --number <n> --verdict accepted|rejected \
[--directive <text>] [--files <f1,f2>] \
[--tests-before <n>] [--tests-after <n>] \
[--lint-before <n>] [--lint-after <n>] \
[--coverage-before <n>] [--coverage-after <n>] \
[--commit-sha <sha>] [--error <msg>]
autodev metrics log <sid> --name <n> --value <v> --baseline <b> \
[--delta <d>] [--unit <u>]
autodev metrics [<session-id>] [--name <metric>] # Query metric trendsIntelligence & Patterns
autodev suggest [--limit <n>] # Suggest directives from history
autodev patterns learn # Detect patterns across sessions
autodev patterns list [--type <type>] # List detected patternsPattern types: fragile-file, winning-pattern, stale-metric, directive-perf
Live Dashboard
autodev ui [--port <n>] [--open] # Launch dashboard (default: port 4170)Live Dashboard
Launch with autodev ui to get a real-time web dashboard at http://localhost:4170.
Features
- Session overview — All sessions with status, directive, acceptance rate, and timestamps
- Iteration timeline — Live feed of accepted/rejected iterations via Server-Sent Events (SSE)
- Metric trends — Track test pass rate, coverage, lint, complexity, and security over time
- Team analytics — Cross-session aggregation: total iterations, overall acceptance rate, metric improvement trends, and top-performing directives
- Zero dependencies — Single HTML file with inline CSS/JS, served by a zero-dep
node:httpserver
API Endpoints
| Endpoint | Description |
|----------|-------------|
| GET /api/sessions | List all sessions |
| GET /api/status?session=<id> | Session status detail |
| GET /api/metrics?session=<id> | Metric history for a session |
| GET /api/report?session=<id> | Full session report |
| GET /api/analytics | Cross-session aggregate analytics |
| GET /api/events | SSE stream of live events |
Agent Intelligence
AutoDev learns from past sessions to make smarter decisions in future runs.
History-Informed Directive Selection
When the improve skill starts, it queries autodev suggest to find:
- High-performing directives — Directives with >60% acceptance rate across past sessions
- Struggled files — Files that appeared in rejections but never in accepted iterations
- Stagnant metrics — Metrics that haven't improved in 3+ sessions
Pattern Learning
Run autodev patterns learn to detect:
| Pattern Type | What it Detects |
|---|---|
| fragile-file | Files with 3+ rejections — avoid targeting early |
| winning-pattern | Directives with 3+ acceptances — re-use these |
| stale-metric | Metrics stagnant across 2+ sessions — focus here |
| directive-perf | Acceptance rate stats per directive — rank by success |
Patterns are stored in the patterns table and used by the improve skill to avoid fragile files and prioritize proven directives.
Skills (42)
Skills are the primary interface. They live in .claude/skills/ and are activated automatically when Claude Code detects a matching intent.
AutoDev Core (12)
| Skill | What it does |
|-------|-------------|
| autodev-improve | Run the autonomous improvement loop — modify → build → test → measure → gate cycles in background. Queries history for smart directive selection. |
| autodev-create | Scaffold a new project from a PRD — parse → scaffold → task DAG → iterative build. |
| autodev-analyze | Scan repo for improvement opportunities — complexity hotspots, coverage gaps, lint issues, security patterns. |
| autodev-status | Current session status — iteration count, acceptance rate, budget remaining. |
| autodev-pause | Pause a running session (current iteration completes first). |
| autodev-resume | Resume a paused session. |
| autodev-stop | Permanently stop a session. |
| autodev-metrics | Metric history — test pass rate, coverage, lint, complexity, security trends. |
| autodev-history | Full event audit trail — every iteration decision and state change. |
| autodev-report | Summary report with metric deltas and accepted changes. |
| autodev-directive | Manage improvement directives — list, add, or remove targeted instructions with priority (1–10). |
| autodev-config | View or set configuration values. |
Additional Categories (30)
| Category | Count | Skills |
|----------|-------|--------|
| AgentDB | 5 | agentdb-vector-search, agentdb-memory-patterns, agentdb-learning, agentdb-optimization, agentdb-advanced |
| GitHub | 5 | github-code-review, github-multi-repo, github-project-management, github-release-management, github-workflow-automation |
| Swarm | 3 | swarm-orchestration, swarm-advanced, sparc-methodology |
| Intelligence | 2 | reasoningbank-intelligence, reasoningbank-agentdb |
| V3 Platform | 9 | v3-ddd-architecture, v3-core-implementation, v3-security-overhaul, v3-memory-unification, v3-performance-optimization, v3-mcp-optimization, v3-cli-modernization, v3-integration-deep, v3-swarm-coordination |
| Dev Workflow | 6 | hooks-automation, pair-programming, skill-builder, stream-chain, browser, verification-quality |
See .claude/skills/README.md for the full reference with parameter details.
Slash Commands
Slash commands live in .claude/commands/ and are available in Claude Code via /command-name.
| Category | Commands |
|----------|----------|
| Analysis | bottleneck-detect, performance-bottlenecks, performance-report, token-efficiency, token-usage |
| Automation | auto-agent, self-healing, session-memory, smart-agents, smart-spawn, workflow-select |
| Monitoring | agent-metrics, agents, real-time-view, status, swarm-monitor |
| Optimization | auto-topology, cache-manage, parallel-execute, topology-optimize |
| GitHub | GitHub integration commands |
| Hooks | Hook management commands |
| SPARC | SPARC methodology commands |
Hooks
The .claude/settings.json configures lifecycle hooks that fire automatically during Claude Code sessions:
| Hook | When it fires | What it does |
|------|---------------|-------------|
| PreToolUse (Bash) | Before any Bash command | Risk assessment via hook-handler.cjs |
| PreToolUse (Write/Edit) | Before file edits | Context and agent suggestions |
| PostToolUse (Write/Edit) | After file edits | Learning and pattern recording |
| PostToolUse (Bash) | After Bash commands | Outcome recording |
| UserPromptSubmit | On every user message | Task routing and intent detection |
| SessionStart | Session begins | Restore previous session state, import memory |
| SessionEnd | Session ends | Persist state |
| SubagentStart | Subagent spawns | Status tracking |
| SubagentStop | Subagent completes | Post-task learning |
| PreCompact | Before context compaction | Session state preservation |
| Stop | Agent stops | Memory sync |
Architecture
The Improvement Loop
1. Read config/default.autodev.yaml
2. Query history: autodev suggest → ranked directives
3. Run pattern learning: autodev patterns learn → fragile files, stale metrics
4. Capture baseline (build, test, lint)
5. For each iteration:
a. Select directive (history-suggested, user-provided, or auto-detected)
b. Read target files (avoid fragile files early)
c. Make focused improvement (Claude Code Edit tool)
d. Build → GATE: must succeed
e. Test → GATE: count and pass rate must not decrease
f. Lint → record new counts
g. Pass all gates? → git add <files> && git commit + log to DB
Fail any gate? → git checkout -- <files> (revert) + log to DB
6. Stop when: budget exhausted, max iterations, or 5 consecutive rejects
7. Output session summary + update DBHard Gates (non-negotiable)
| Gate | Rule | |------|------| | Build | Must succeed | | Test count | Must not decrease | | Test pass rate | Must not decrease | | Coverage | Must not decrease |
A single gate failure triggers an immediate revert — no exceptions.
Composite Scoring
| Metric | Weight | |--------|--------| | Test Pass Rate | 0.30 | | Coverage | 0.25 | | Lint Score | 0.25 | | Complexity | 0.10 | | Security | 0.10 |
Module Map (src/)
The src/ directory contains the TypeScript engine that powers the skills:
| Module | Role |
|--------|------|
| src/engine/ | ImproveCoordinator, IterationRunner, BudgetController, DirectiveParser |
| src/eval/ | EvaluationPipeline — runs collectors, hard gates, weighted scorer |
| src/eval/collectors/ | 5 metric collectors: test pass rate, coverage, lint, complexity, security |
| src/llm/ | Multi-provider LLM clients, context assembler, changeset parser |
| src/git/ | Git sandbox — in-place (GitSandboxImpl) and worktree (WorktreeSandboxImpl) |
| src/persistence/ | SQLite (WAL mode) — event store, metric store, session manager, pattern store |
| src/create/ | PRD parser, task decomposer, project scaffolder |
| src/intelligence/ | History analyzer, pattern learner, complexity/coverage/lint analysis, opportunity ranking |
| src/cli/ | CLI command implementations and DB helpers |
| src/ui/ | Live dashboard HTTP server + single-file HTML frontend |
| src/integration/ | Claude Flow memory bridge — optional pattern storage |
| src/types/ | Shared TypeScript interfaces |
Data Storage
SQLite database at .autodev/autodev.db (WAL mode, foreign keys enabled):
| Table | Purpose |
|-------|---------|
| sessions | Session config, status, aggregate stats |
| events | Event-sourced iteration decisions (accepted/rejected/error) |
| iterations | Per-iteration metadata |
| metrics | Metric snapshots (name, value, baseline, delta, unit) |
| patterns | Learned patterns (fragile files, winning directives, stale metrics) |
Configuration
Default config at config/default.autodev.yaml:
llm:
provider: claude-code # runs inside Claude Code — no API key needed
model: claude-opus-4-6
max_tokens_per_iteration: 16000
temperature: 0.3
improve:
max_iterations: 200
max_hours: 8
max_cost_usd: 50.00
strategy: balanced # balanced | aggressive | conservative
build:
command: npm run build
timeout_seconds: 120
test:
command: npm test
timeout_seconds: 180
lint:
command: npx eslint . --format json
timeout_seconds: 60
metrics:
hard_gates:
build_succeeds: true
tests_pass_not_decreased: true
test_count_not_decreased: true
coverage_not_decreased: true
weights:
test_pass_rate: 0.30
coverage: 0.25
lint: 0.25
complexity: 0.10
security: 0.10Development
Prerequisites
- Node.js ≥ 18
- pnpm
Build & Test
pnpm install
pnpm run build # tsup → dist/ (ESM + CJS)
pnpm test # vitest
pnpm test -- --run tests/unit/hard-gates.test.ts # single file
pnpm run test:watch # watch mode
pnpm run typecheck # tsc --noEmit
pnpm run lint # eslint
pnpm run lint:fix # eslint with auto-fix
pnpm run format # prettierProject Structure
.claude/
├── skills/ # 42 Claude Code skills (primary interface)
├── commands/ # 50+ slash commands
├── agents/ # Agent configurations
├── helpers/ # Hook handlers, statusline, utilities
└── settings.json # Hooks, permissions, environment config
config/
└── default.autodev.yaml # Build/test/lint commands, budget limits, metric weights
src/
├── cli.ts # CLI entry point (init, ui, session, suggest, patterns, …)
├── cli/
│ ├── commands.ts # CLI command implementations (12 subcommands)
│ └── db.ts # Shared DB helpers
├── ui/
│ ├── server.ts # Live dashboard HTTP server + SSE + API routes
│ └── dashboard.html # Single-file dark-themed dashboard
├── engine/ # Core improvement loop
├── eval/ # Evaluation pipeline + 5 collectors
├── create/ # PRD → project scaffolding
├── git/ # Git sandbox (in-place & worktree)
├── llm/ # LLM client implementations
├── intelligence/ # History analyzer, pattern learner, pre-improvement analysis
├── persistence/ # SQLite event sourcing + pattern store
├── integration/ # Claude Flow memory bridge
├── types/ # Shared TypeScript interfaces
└── index.ts # Library entry (types only)
tests/
├── unit/ # 14 unit test suites
└── integration/ # End-to-end testsDesign Principles
- Native Claude Code execution — No external server, no API key, no subprocess. Skills spawn subagents that use Claude Code's own tools.
- Hard gates are non-negotiable — Build must succeed, tests must not regress, coverage must not decrease.
- Learn from history — Past session data informs directive selection, avoids fragile files, and prioritizes proven patterns.
- Small, focused changes — One concern per iteration. Each change is validated independently.
- Immediate revert on failure — Any gate failure triggers
git checkouton changed files. - Event sourcing — All decisions are recorded to SQLite for auditability.
- Dependency injection — All coordinators accept typed interfaces, making the engine testable and extensible.
