donice-evaluator
v0.1.0
Published
OpenCode plugin: lock an agent into a milestone-based target with deterministic + LLM evaluators and a steering loop.
Maintainers
Readme
DoNice
An OpenCode plugin that locks an agent into a target — broken into milestones — until independent evaluators say each one is done.
What it does
You define a final goal split into ordered milestones. The plugin then:
- Injects only the current milestone description into the agent's system prompt (upcoming titles are listed for context, not their bodies).
- After every
session.idle, runs:- the deterministic test scripts assigned to this milestone (any executable, exit 0 = pass);
- a fresh-context LLM reviewer (a brand-new session with no
prior history) that grades the workspace against this milestone's
private rubric and returns concrete
how_to_fixinstructions per issue.
- If the milestone passes, auto-advances to the next milestone with a hand-off message. When the final milestone passes, posts a one-time "target satisfied" notice and stops nagging.
- If the milestone fails, pushes a steering message — including the reviewer's "How to fix" hints — back into the session so the agent keeps fixing the current milestone only.
- Blocks every tool call that would let the agent peek at evaluation scripts or rubrics — so it can't reward-hack by reading the answer key.
Install
Add the plugin to your project's opencode.json:
{
"$schema": "https://opencode.ai/config.json",
"permission": { "question": "allow" },
"plugin": ["donice-evaluator"]
}OpenCode auto-installs the package via Bun on next start. The plugin's
first run writes its slash command stubs into .opencode/commands/ of
your project; restart OpenCode once so the TUI's / autocomplete
picks them up. After that you're done — no further setup.
If you'd rather pin to GitHub instead of npm:
{ "plugin": ["github:Rorical/DoNice"] }(Requires the package to be available; see the Develop locally section below for the layout.)
Quick start
/donice-create # interactively author a target (uses the question tool)
/clear # wipe authoring context (the agent saw rubric content)
/donice-launch # reset milestone state and kick off implementationAfter /donice-launch the agent receives milestone 1's description and
starts work. Every time it goes idle, the evaluator runs and either steers
it back to fixing the current milestone or advances it to the next.
Layout (in your project after install)
your-project/
├── opencode.json # references "donice-evaluator" in plugin[]
└── .opencode/
├── commands/ # auto-seeded by the plugin on first run
│ ├── donice-create.md donice-launch.md donice-status.md
│ ├── donice-evaluate.md donice-milestone.md donice-advance.md
│ ├── donice-pause.md donice-resume.md donice-reset.md
│ ├── donice-toggle.md donice-list-models.md donice-set-model.md
│ └── donice-reload.md
└── donice-target/ # (created by /donice-create or by you)
├── target.json # public — title, description, milestones[]
└── private/ # blocked from all tool access
├── tests/ # deterministic scripts (per-milestone filenames)
└── milestones/ # one rubric.md per milestoneDevelop locally
Repo layout:
DoNice/ # publishable as 'donice-evaluator'
├── index.ts # plugin entrypoint (the published source)
├── commands/ # bundled command stubs (the published source)
├── package.json # peerDeps on @opencode-ai/plugin
├── README.md
├── examples/ # reference targets — opt-in, not auto-loaded
│ └── greet-py-target/ # 2-milestone CLI + docs example
└── .opencode/
└── plugins/
└── donice-evaluator.ts # one-line shim re-exporting ../../index.tsThe repo deliberately does not ship an active .opencode/donice-target/,
so cloning it doesn't carry a stray answer key. To hack on the plugin
end-to-end:
bun install
cp -R examples/greet-py-target .opencode/donice-target # opt in to the example
opencodeThe .opencode/plugins/donice-evaluator.ts shim loads index.ts directly,
so edits show up on the next session restart. The bootstrap writes
.opencode/commands/ on first plugin load — restart once, then iterate.
To publish:
bun install
npm publish --access publicSlash commands
| Command | Effect |
|---|---|
| /donice-create | Interactively author a fresh target. The agent uses the question tool to gather info, then writes target.json + private files via the donice_init_target tool. |
| /donice-launch | Reset per-session state and post the implementation kickoff for milestone 1. Run after /clear. |
| /donice-status | Show steering state, milestone list, and last evaluation report. |
| /donice-milestone | Print the current milestone's description plus upcoming titles. |
| /donice-evaluate | Run the enabled evaluators on the current milestone now. |
| /donice-advance | Force-advance to the next milestone (manual override). |
| /donice-pause / /donice-resume | Suspend / resume the steering loop. |
| /donice-toggle det / /donice-toggle llm | Flip an evaluator on/off. |
| /donice-list-models | Print every provider/model OpenCode has connected. |
| /donice-set-model <providerID>/<modelID> | Switch the LLM reviewer live. |
| /donice-reset | Reset iteration counter, completion flag, and milestone index for this session. |
| /donice-reload | Re-read target.json from disk after a manual edit. |
Authoring a target by hand
If you'd rather skip /donice-create, write target.json directly:
{
"title": "Short label",
"description": "What 'done' looks like overall.",
"evaluatorModel": { "providerID": "anthropic", "modelID": "claude-sonnet-4-20250514" },
"enableDeterministic": true,
"enableLLM": true,
"maxSteerIterations": 25,
"steerCooldownMs": 4000,
"milestones": [
{
"id": "m1-cli",
"title": "CLI behavior",
"description": "Visible to the agent — what done looks like for this phase.",
"tests": ["10-foo.sh", "11-bar.sh"],
"rubric": "milestones/m1-cli.md"
}
]
}evaluatorModelaccepts anyproviderID/modelIDOpenCode is connected to —/donice-list-modelsshows the live picker.testsis a list of filenames insideprivate/tests/. Each script must exit 0 on pass and write a short abstract failure message to stderr/stdout — that is the only thing the agent sees.rubricis a path relative toprivate/. The reviewer LLM reads it in a fresh session and returns strict JSON{passed, summary, issues:[{what, how_to_fix}, …]}.- Omit
milestonesentirely and provideprivate/rubric.md+ a flatprivate/tests/for legacy single-target mode.
Reward-hacking guards
tool.execute.before rejects any call where a path argument resolves
into .opencode/donice-target/private/ or .opencode/plugins/, plus
any bash/grep/glob whose command or pattern mentions those paths.
The plugin's own donice_init_target tool is the only writer allowed
into private/.
Note: during /donice-create the authoring session necessarily sees
rubric text the user dictates. That's why the create flow ends by
telling the user to /clear and /donice-launch — implementation
starts in a context that has no prior exposure.
Stopping the loop
/donice-pause— pauses steering globally until/donice-resume.maxSteerIterations— defaults to 25; the plugin posts a "steering halted" notice and stops.- All milestones pass — the plugin marks the session completed and
never steers it again (until
/donice-reset). - Delete
target.json— the plugin idles.
