@atonev/opencode-kasper
v1.2.0
Published
OpenCode plugin that monitors agent sessions, scores adherence to user instructions via LLM-as-judge, and injects corrective instructions into AGENTS.md and per-agent prompt files.
Downloads
563
Maintainers
Readme
Kasper
Kasper is a plugin for opencode that monitors agent sessions, scores adherence to user instructions via LLM-as-judge, and injects corrective instructions into AGENTS.md and per-agent prompt files.
Unofficial plugin: This is an independent project and is not affiliated with, endorsed by, or maintained by the opencode team.
Features
- LLM-as-Judge Scoring — Evaluates every session on 5 dimensions: instruction following, completeness, proactiveness, code quality, and communication
- Automatic Improvements — Detects recurring weaknesses and injects fixes into
AGENTS.mdor per-agent prompts (auto or manual approval) - Idle-Aware Evaluation — Sessions scored only when idle or complete, preventing partial-turn scoring
- Per-Agent Scoring — Separate aggregates and weakness profiles per agent
- Batch & Retroactive Scoring — Score past sessions via
/kasper score session <id>or bulk withlast N - Subagent Tracking — Tracks subagent calls and evaluates child sessions independently
- Real-time Status —
/kasper statusshows an In Progress banner with the current evaluation pass, weakness merge, and any pending improvements - Backups & Safety — Timestamped backups before every change; atomic writes with file locks
- Prompt Resolution — Honours opencode's
agent.<name>.prompt{file:...}and{path:...}directives; will not overwrite the wrong file
Installation
npm install @atonev/opencode-kasperRegister the plugin in your opencode.json (or in your global ~/.config/opencode/opencode.json):
{
"plugin": ["@atonev/opencode-kasper"]
}That's the entire plugin registration — there are no plugin-level options to pass on the plugin line. All kasper configuration (auto-update, model, thresholds, etc.) lives in a separate file, described in Configuration below.
Verify: Start a session and run /kasper status.
Commands
| Command | Description |
|---|---|
| /kasper status [agent] | Aggregate scores, top weaknesses, recent sessions, sparkline trend, and live In Progress banner |
| /kasper score session <id> | Evaluate a past session (last N, since YYYY-MM-DD, range X Y) |
| /kasper improve [agent] | Numbered table of improvement suggestions; pass --dry-run to preview without applying |
| /kasper apply [n\|all] | Apply pending improvement (n for an index, all for everything queued) |
| /kasper history [agent] | Session history with score breakdowns |
| /kasper auto on\|off | Toggle auto-apply for improvements in the current session |
| /kasper migrate <name> [--show] | Extract an inline opencode.json agent prompt to a file so kasper can edit it. With --show, just reports the current source |
| /kasper reset | Clear all state (prompts /kasper reset --force to confirm) |
| /kasper help | Show all commands |
Tools
| Tool | Description |
|---|---|
| kasper_status | Aggregate scores, per-agent breakdown, weaknesses |
| kasper_improve | Numbered improvement suggestions; supports --dry-run |
| kasper_apply | Apply by [N] index, or all |
| kasper_history | Adherence history and trends |
| kasper_score_session | Evaluate one or more sessions (single id, last N, since YYYY-MM-DD, range X Y) |
| kasper_reset | Clear all state |
Configuration
Loaded from ~/.config/opencode/kasper.jsonc, .opencode/kasper.jsonc, or the kasper key in opencode.json. Project values override global values; missing fields fall back to the defaults below.
Core
| Field | Type | Default | Description |
|---|---|---|---|
| enabled | boolean | true | Master switch. Set to false to disable the plugin without uninstalling |
| auto_update | boolean | true | Auto-apply improvements to AGENTS.md / agent prompts. Set to false to require /kasper apply for every change |
| scoring_threshold | number (0.0–1.0) | 0.6 | Sessions scoring strictly below this trigger weakness detection and improvement suggestions |
| model | string | opencode/deepseek-v4-flash-free | Provider/model used as the LLM judge. Pick a fast, cheap model |
| detail_level | minimal | standard | thorough | standard | How much context the judge sees. minimal is cheapest, thorough is most accurate |
Evaluation
| Field | Type | Default | Description |
|---|---|---|---|
| min_session_messages | integer (1–50) | 1 | Skip sessions with fewer than this many user messages |
| min_observations_for_update | integer (1–10) | 2 | A weakness must be observed at least this many times before an improvement is generated |
| weakness_decay_days | integer (0–365) | 30 | After this many days without recurrence, weaknesses are forgotten. 0 disables decay |
| evaluation_poll_interval_ms | integer (1000–300000) | 10000 | How often the background loop scans for idle sessions |
| quiet | boolean | false | Suppress non-warning toast notifications |
Scoring robustness
| Field | Type | Default | Description |
|---|---|---|---|
| scoring_retries | integer (0–10) | 2 | Retries when the scoring model returns invalid JSON |
| scoring_timeout_ms | integer (10000–600000) | 120000 | Per-attempt timeout for the scoring model call |
| max_score_input_chars | integer (1000–50000) | 10000 | Cap on input sent to the scoring model per session |
| max_agent_guidance_chars | integer (200–5000) | 1200 | Cap on the size of each generated improvement (AGENTS.md or per-agent) |
| improvement_expiry_days | integer (0–365) | 60 | Inactive improvements are pruned after this many days. 0 = never expire |
Safety
| Field | Type | Default | Description |
|---|---|---|---|
| strict_sanitize | boolean | true | Reject generated improvements containing URLs, code blocks, or instruction-injection markers |
| agent_prompt_inject_mode | section | inline | section | How kasper writes improvements into an agent prompt file. section adds a visible ## Kasper Inferred Instructions block at the end. inline appends the guidance directly with no section header — wrapped only in <!-- kasper-injected:begin/end --> HTML comments for dedupe and rollback. AGENTS.md injection is always section style |
| debug | boolean | false | Log SDK events and extra diagnostics. Disable in production |
Storage
| Field | Type | Default | Description |
|---|---|---|---|
| state_dir | string | .opencode/kasper (relative to cwd) | Where session scores, state, and backups are written. Absolute path or project-relative |
Example
~/.config/opencode/kasper.jsonc (recommended — full JSONC support including comments):
{
"enabled": true,
"auto_update": true,
"scoring_threshold": 0.65,
"model": "opencode/minimax-m2.5-free",
"min_observations_for_update": 3,
"quiet": true,
"state_dir": "/var/lib/kasper"
}…or as a nested kasper key in opencode.json:
{
"plugin": ["@atonev/opencode-kasper"],
"kasper": {
"auto_update": true,
"scoring_threshold": 0.65,
"model": "opencode/minimax-m2.5-free"
}
}The kasper.jsonc file always wins over the kasper key in opencode.json if both exist. Project-local config overrides global config; the global config file lives at ~/.config/opencode/kasper.jsonc (or wherever XDG_CONFIG_HOME points).
Agent Prompt Resolution
Kasper follows opencode's own agent resolution rules when deciding where to read and write an agent's prompt. For an agent named <name>:
- Project
opencode.json— ifagent.<name>.promptis defined, that wins. Project config overrides global config. - Global
opencode.json— used if no project entry exists. - Convention — if no
agent.<name>is defined in either config, kasper falls back to:<projectRoot>/.opencode/agent/<name>.md<projectRoot>/.opencode/agents/<name>.md~/.config/opencode/agent/<name>.md~/.config/opencode/agents/<name>.md
The prompt value is interpreted in four ways:
- Raw string — the prompt is inline. Kasper refuses to edit it; run
/kasper migrate <name>to extract it to a file. {file:/abs/path/to/prompt.md}— the prompt is loaded from that file. Kasper reads and writes that exact file.~is expanded; relative paths resolve against the config file's directory.{path:/abs/path/to/prompt.md}— alias for{file:...}.
Plugin override files (.opencode/<plugin>.json, e.g. oh-my-openagent.json) accept one additional form:
file:///abs/path/to/prompt.md— afile://URI. Kasper reads and writes the file at that path.~/...URIs resolve to$HOME. This is the form used by oh-my-opencode and similar plugins that ship agent prompts innode_modulesand let users redirect them via a plugin-specific config. Note:file://URIs inopencode.jsonitself (not in a plugin override file) are not classified as afile_urisource — they fall through to inline. Put plugin redirects in a plugin override file.
After migrate, the source opencode.json is rewritten to replace the inline prompt with a {file:...} directive, with comments and formatting preserved. Restart opencode for the new prompt file to take effect.
Why this matters: before this resolution existed, kasper would silently create an empty
<projectRoot>/.opencode/agents/<name>.mdwhenever the real prompt was defined via{file:...}— the only signal that anything happened was a stray stub file with the injected section. The resolver eliminates that class of bug entirely.
Injection style
How kasper writes an improvement into the resolved prompt file is controlled by agent_prompt_inject_mode:
section(default) — appends a clearly labelled block:
Self-documenting, easy to spot in a diff, easy to roll back via## Kasper Inferred Instructions <!-- kasper: 2026-06-08T12:00:00.000Z --> <guidance>/kasper rollback <agent>.inline— appends the guidance directly at the end of the file with no section header:
The HTML comments exist only for dedupe and rollback bookkeeping — they render as nothing in the prompt. Use this when the visible<!-- kasper-injected:begin --> <guidance> <!-- kasper-injected:end -->## Kasper Inferred Instructionsheading is unwanted (e.g. you have a strict prompt structure that you don't want kasper touching).
Scoring
Each session is scored 0.0–1.0 across five dimensions:
| Dimension | Description |
|---|---|
| instruction_following | Did the agent do exactly what was asked? |
| completeness | Did the agent fully complete the task? |
| proactiveness | Did the agent act appropriately? |
| code_quality | Quality and maintainability of code produced |
| communication | Clarity and helpfulness of explanations |
Scores display as 🟢 ≥80%, 🟡 ≥60%, 🔴 <60%. The /kasper status command shows an ASCII sparkline of the last 7 session scores.
How It Works
- Observe — Hooks on
chat.messageandsession.createdaccumulate session data; configurable polling catches idle sessions - Evaluate — LLM-as-judge scores each session across 5 dimensions; large sessions split into per-pair evaluation
- Improve — Recurring weaknesses trigger suggestions (
AGENTS.mdor per-agent prompt); auto-applied or queued for review - Measure — Score delta tracking shows before/after improvement impact in
/kasper history
The /kasper status In Progress banner surfaces what's happening right now: paused state, the active evaluation pass (with elapsed time and queued session count), the cross-session weakness merge LLM call, and any pending improvements waiting for the next auto-update tick.
Limitations
- Forward-looking only. Only sessions created after plugin start are auto-scored. Use
/kasper score session last <N>for retroactive batch scoring. - Current config only. Scoring uses today's
AGENTS.mdand prompts, not the versions active when the session originally ran. - Subagents. Subagent sessions are always eligible for auto-scoring. A subagent session needs only 1 user message to be picked up (primary sessions use
min_session_messages). The result is rolled up into the parent agent's per-agent stats.
Development
bun install # Install dependencies
bun run build # Compile TypeScript
bun run typecheck # Type-check only
bun run lint # Lint with biome
bun test # ~504 unit tests (503 pass, 1 pre-existing fail in auto-update.test.ts)
bun run test:e2e # End-to-end tests (requires OPENCODE_E2E=1 and the opencode binary)License
MIT — see LICENSE.
