optimizespec
v0.1.1
Published
CLI and skills for building repo-local agent self-improvement systems.
Maintainers
Readme
OptimizeSpec
OptimizeSpec helps you make an agent better in a measured way, even if you have never built an eval suite or optimization loop before.
You start with a plain-language goal, such as "make support-triage answers more complete." The OptimizeSpec skills guide your coding agent through turning that goal into eval cases, scoring criteria, a runner that calls your real agent, and an optimization loop.
You do not need to know the full system shape before you start. The skills draft the proposal, identify unknowns, ask for confirmation where needed, and then implement the eval runner, scorer, optimizer, adapter, and evidence ledger once the plan is clear.
What You Get
- A structured workflow for turning an improvement idea into evals, scoring, and optimization code.
- Production-equivalent evals against your real agent runtime, tools, skills, MCP servers, environment, and permissions.
- Traceable optimization results with candidate IDs, per-case rollouts, scores, feedback, and a selected best candidate.
Quick Start
- Install the CLI:
bun install -g optimizespec- Then install the skills:
npx skills add terminaluse/optimizespec --skill '*'- Initialize the project metadata once:
optimizespec initNow create or update your optimization system with the skills:
/optimizespec-new
Create an optimization system to improve the agent in this folderContinue until all the spec artifacts are generated:
/optimizespec-continueImplement the spec:
/optimizespec-apply improve-agent-outputThe apply skill runs verification as part of implementation. If you correct the implementation afterward, run the verify skill again:
/optimizespec-verify improve-agent-outputHow OptimizeSpec Works
OptimizeSpec skills include contracts for building optimization systems for agents. Your coding agent uses those contracts to implement the runner, scorer, optimizer, adapter, evidence ledger, candidate registry, and verification flow for your agent.
The core contracts are runtime-neutral. The skills include a reference system for Python Claude Managed Agents, and contributions for other hosted agent runtimes and languages are welcome.
How the Self-Improvement Works
The generated self-improvement system uses GEPA's Optimize Anything API as the optimization engine. OptimizeSpec defines the eval runner, scorer, candidate surface, ASI feedback, and evidence ledger; GEPA uses those pieces to evaluate candidates, reflect on live failures, propose mutations, and select better candidates.
GEPA is a reflective evolutionary optimizer: it improves text-representable candidates by combining scores, traces, feedback, and Pareto-efficient search. Read How GEPA Works for the underlying optimization loop.
What Spec Artifacts Get Created
OptimizeSpec keeps planning artifacts in one root folder:
optimizespec/changes/<change-name>/
proposal.md
design.md
specs/
tasks.mdThe proposal records where the durable optimization-system code will live:
## Optimization System Location
- Decision: create new folder|use existing folder
- Path: <repo-relative path>
- Import/runtime access plan: <how generated code imports or invokes the real agent modules>$optimizespec-apply <change-name> writes runner, scorer, optimizer, adapter, and evidence code to that recorded path.
[!NOTE] Choose the path based on your repo's structure. The optimization system should live where it can import or invoke the real agent, tools, skills, MCP servers, environment configuration, and permissions through a narrow adapter, so optimization runs evaluate the same integrations your production agent uses.
What a Run Produces
An optimizer run outputs:
optimizer-summary.jsonrecords the selected candidate, score summary, per-case live scores, budgets, and artifact paths.candidates.jsonrecords every candidate with stable candidate IDs so scores can be traced back to prompts or other candidate surfaces.rollout.json,score.json, andside_info.jsoncapture per-case execution evidence, grader output, feedback, errors, and ASI inputs.
Learn More
- Contract references for runner, grader, candidate, optimizer, and runtime contracts.
- TECHNICAL.md for architecture, package boundaries, and release notes.
- How GEPA Works for GEPA's reflective evolutionary optimization loop.
- DEVELOPMENT.md for local development.
Acknowledgements
OptimizeSpec is only possible due to all the great work Lakshya has done on GEPA.
OptimizeSpec's spec-driven development approach is strongly inspired by OpenSpec. We highly recommend it; this repo was built using OpenSpec.
License
MIT
