@erosshi/whetstone

v1.0.0

Published

9 days ago

Superpowers gives your agent powers; Whetstone gives it an edge. A lean, flexible operating layer for Claude.

Downloads

0High
0Medium
0Low

erosshi

Whetstone

Reliability rails for AI coding agents. Whetstone makes Claude Code and Cowork agents verify before they claim "done," stop fabricating, and stay in scope — most of all on cheaper or unattended runs, where mistakes slip by unnoticed.

Built for people who run agents to ship real work: vibe-coders, solo builders, and anyone running smaller or cheaper models to save tokens.

It runs in Claude Code, Cowork, or anywhere (one portable file), is MIT with no upsell, dials from light to strict so it's never in your way, and is hard to break on install — it refuses to stack on other frameworks and verifies itself with doctor.

Install — pick one:

Portable     paste WHETSTONE.md as your system prompt or ~/.claude/CLAUDE.md   (any model, zero install)
Claude Code  /plugin marketplace add Erosshi/whetstone  →  /plugin install whetstone@bhayson-whetstone
CLI          npx @erosshi/whetstone  →  hone init

Superpowers gives your agent powers. Whetstone gives it an edge.

What it does — and what it doesn't

| Whetstone does | Whetstone doesn't | |---|---| | Make a capable model verify, persist, stay in scope, and not fabricate | Make a model smarter or solve problems beyond it | | Stay invisible on easy tasks; add rigor only as stakes rise (the process knob) | Force TDD, specs, or worktrees on a one-line change | | Run lean — 5 lazy-loaded skills + opt-in packs | Load hundreds of skills that bloat context | | Install on one inspectable path; refuse to stack; uninstall clean | Need five install methods or a repair command | | Show you its token cost (doctor) and the silent skill-budget cliff | Quietly eat context you can't see |

It sharpens; it does not forge. When a task genuinely exceeds the model, Whetstone says so instead of producing confident-sounding nonsense.

Why this exists (and why it's different)

The two popular frameworks in this space have opposite, well-documented problems:

obra/superpowers is too rigid — forced TDD with a "delete code written before tests" rule, a mandatory spec→plan split even fans say to collapse, required git worktrees, and a 66%-shell codebase that breaks on Windows.
affaan-m/ECC is too heavy — 232 skills, five competing install methods that conflict so often it ships a repair command, rules that can't even be distributed through the plugin, and a commercial tier that muddies the "open source" promise.

Both are praised for the same core ideas (auto-triggering skills, adversarial sub-agent review, a brainstorming/spec stage) and criticized for the same root cause: too much, too prescriptive, loaded all at once. Past a threshold, more skills in context make the model worse.

Whetstone takes the proven ideas and fixes the rest:

| | superpowers | ECC | Whetstone | |---|---|---|---| | Core bet | Rigid methodology | Everything ecosystem | Lean core + opt-in packs | | Flexibility | Forced TDD / spec / worktree | Heavy default surface | One knob: light/standard/strict | | Default weight | Medium-heavy | Heavy (232 skills) | Tiny (5 skills, lazy-loaded) | | Worktrees | Required | Optional | Never required (branch-only default) | | Install | Per-harness; some curl-pipe-to-LLM | 5 methods, "don't stack" | One inspectable path; refuses to stack | | Cross-platform | Shell (Windows risk) | Node | Node (Windows-safe) | | Cowork support | No | No | Yes | | Verification | Sub-agent review | code-reviewer agent | Fresh-context verifier, confidence-gated | | Verified pipelines | — | — | Yes — default-deny artifact gate + resume (unique) | | Verified multi-agent handoffs | partial (subagent review) | partial (orchestration) | Yes — work-bus tasks gated by acceptance checks | | Recovery | — | doctor/repair | doctor/uninstall on a tiny surface | | Commercial | Sponsor | Pro GitHub App | None. MIT, no upsell, ever |

What you get

Core (always installed, lean — ~5 lazy-loaded skills):

whetstone — the always-on operating mode (the five laws + the process knob). Lean by design.
brainstorm — Socratic spec refinement before any building (the single most-praised idea from superpowers, kept tight).
plan — a short plan sketch; skipped entirely in light mode.
implement — execute with tests as a sharp default, escapable (--no-tdd / light); never deletes working code for ritual.
review — fresh-context adversarial review via the verifier/reviewer agents (confidence-thresholded, findings consolidated, security-first).

Optional packs (OFF by default, opt-in — run hone init and it picks the right ones for you):

pipeline — a verified pipeline runner for repeatable generation/build/render/data jobs. Each step must produce a real artifact (not just exit 0) before the next runs; halts and resumes instead of cascading garbage.
orchestrate — Brain mode + a verified work-bus for running multiple agents (a Cowork Brain plus Claude Code, or several Cowork lanes). Includes the stoplight reply protocol: the Brain marks the human's single next action with a 🔴 ACTION NEEDED line so you never skim past the task you need to hand to the next agent. Toggle with hone bus stoplight on|off.
security — an OWASP-aware code security scan (AgentShield-style, leaner).
memory — session memory + continuous learning (promote lessons into reusable notes; privacy-safe, local-only).
domains — a generic deep-audit skill (audits any app) plus a _TEMPLATE for adding your own stack-specific playbooks. Ships no private/project-specific skills.
content — a content-creation skill + a clean-room Node ComfyUI runner for image/video/3D/audio generation that verifies a real artifact came out (not just exit 0). Composes with pipeline for multi-step jobs. Needs a local ComfyUI; ships no models or model-specific workflows.

Not sure which packs you want? Run hone init — a short setup that turns on only what matches how you work. Quick guide: run several agents at once → orchestrate; run build/render/ETL jobs you want gated → pipeline; generate images/video/3D/audio via ComfyUI → content; want an on-demand security scan → security; want memory across sessions → memory; audit codebases → domains. Everything stays off until you say yes.

Tooling: a single Node installer with doctor, uninstall, conflict detection (it refuses to stack on an existing superpowers/ECC install — their #1 break mode), and per-skill / per-hook disable + a context budget.

Install

One canonical path per surface. Do not mix methods. Whetstone detects conflicting installs and stops rather than create duplicates.

Portable (any model, anywhere — zero install): Paste WHETSTONE.md into your system prompt or save it as ~/.claude/CLAUDE.md (global) or ./CLAUDE.md (per project). That alone gives you the spine.

Claude Code (full kit, auto-updating):

/plugin marketplace add Erosshi/whetstone        # or a local path to this repo
/plugin install whetstone@bhayson-whetstone

Cowork: install the skills from dist/*.skill via the Save skill button — start with whetstone.skill; add others as you want them.

hone CLI (optional, the friendly wrapper — Node 18+):

npx @erosshi/whetstone   # or: npm i -g @erosshi/whetstone  then  hone
hone init                # guided setup — picks the right packs for how you work (recommended)
hone                     # or just install the lean core, no questions
hone add orchestrate     # enable a pack (orchestrate | pipeline | security | memory | domains)
hone doctor              # health check
hone mode strict         # how to set the default process mode
hone uninstall --yes     # clean removal

Prefer the raw scripts? node install/install.mjs --profile core|core+packs|full, node install/doctor.mjs, node install/uninstall.mjs.

See docs/INSTALL.md for details and docs/CONFIGURATION.md for the process knob, per-skill disable, and the context budget.

Philosophy in one paragraph

A top-tier model is already capable; the gap to "senior engineer who never drops the ball" is behavioral. So Whetstone ships the fewest rules that close that gap, makes the amount of process a dial instead of a dogma, loads detail only when a task triggers it (so it never bloats context), and verifies with fresh eyes instead of self-congratulation. It is honest about its ceiling: it sharpens; it does not forge. Full reasoning in docs/PHILOSOPHY.md; the honest head-to-head is in docs/COMPARISON.md.

For AI agents & discoverability

Agents entering this repo should read AGENTS.md; a machine-readable site map is in llms.txt. End users: the entire operating contract is the single file WHETSTONE.md. Installable via npx @erosshi/whetstone (the hone command) or the Claude Code plugin marketplace.

Keywords: Claude, Claude Code, Cowork, Agent Skills, SKILL.md, Claude plugin, agent operating mode, AI workflow framework, sub-agents, code review, verification, TDD, prompt engineering, Opus, Sonnet, superpowers alternative, ECC alternative.

Contributing

Issues and PRs welcome. Keep skills lean (SKILL.md under ~120 lines), references one level deep, descriptions third-person with trigger phrases, and behavior in what the plugin can actually ship. See docs/CONTRIBUTING.md.

License

MIT. No telemetry, no account, no paid tier. Ever. See LICENSE.

Credits

Stands on the shoulders of obra/superpowers and affaan-m/ECC — Whetstone adopts their best ideas and is built expressly to fix the issues their own users report. Grounded in Anthropic's Prompting Claude guidance, Claude Code best practices, and the Agent Skills spec.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme