@erosshi/whetstone
v1.0.0
Published
Superpowers gives your agent powers; Whetstone gives it an edge. A lean, flexible operating layer for Claude.
Downloads
91
Readme
Whetstone
Reliability rails for AI coding agents. Whetstone makes Claude Code and Cowork agents verify before they claim "done," stop fabricating, and stay in scope — most of all on cheaper or unattended runs, where mistakes slip by unnoticed.
Built for people who run agents to ship real work: vibe-coders, solo builders, and anyone running smaller or cheaper models to save tokens.
It runs in Claude Code, Cowork, or anywhere (one portable file), is MIT with no
upsell, dials from light to strict so it's never in your way, and is hard to break on install — it
refuses to stack on other frameworks and verifies itself with doctor.
Install — pick one:
Portable paste WHETSTONE.md as your system prompt or ~/.claude/CLAUDE.md (any model, zero install)
Claude Code /plugin marketplace add Erosshi/whetstone → /plugin install whetstone@bhayson-whetstone
CLI npx @erosshi/whetstone → hone initSuperpowers gives your agent powers. Whetstone gives it an edge.
What it does — and what it doesn't
| Whetstone does | Whetstone doesn't |
|---|---|
| Make a capable model verify, persist, stay in scope, and not fabricate | Make a model smarter or solve problems beyond it |
| Stay invisible on easy tasks; add rigor only as stakes rise (the process knob) | Force TDD, specs, or worktrees on a one-line change |
| Run lean — 5 lazy-loaded skills + opt-in packs | Load hundreds of skills that bloat context |
| Install on one inspectable path; refuse to stack; uninstall clean | Need five install methods or a repair command |
| Show you its token cost (doctor) and the silent skill-budget cliff | Quietly eat context you can't see |
It sharpens; it does not forge. When a task genuinely exceeds the model, Whetstone says so instead of producing confident-sounding nonsense.
Why this exists (and why it's different)
The two popular frameworks in this space have opposite, well-documented problems:
- obra/superpowers is too rigid — forced TDD with a "delete code written before tests" rule, a mandatory spec→plan split even fans say to collapse, required git worktrees, and a 66%-shell codebase that breaks on Windows.
- affaan-m/ECC is too heavy — 232 skills, five competing install methods that conflict so often
it ships a
repaircommand, rules that can't even be distributed through the plugin, and a commercial tier that muddies the "open source" promise.
Both are praised for the same core ideas (auto-triggering skills, adversarial sub-agent review, a brainstorming/spec stage) and criticized for the same root cause: too much, too prescriptive, loaded all at once. Past a threshold, more skills in context make the model worse.
Whetstone takes the proven ideas and fixes the rest:
| | superpowers | ECC | Whetstone |
|---|---|---|---|
| Core bet | Rigid methodology | Everything ecosystem | Lean core + opt-in packs |
| Flexibility | Forced TDD / spec / worktree | Heavy default surface | One knob: light/standard/strict |
| Default weight | Medium-heavy | Heavy (232 skills) | Tiny (5 skills, lazy-loaded) |
| Worktrees | Required | Optional | Never required (branch-only default) |
| Install | Per-harness; some curl-pipe-to-LLM | 5 methods, "don't stack" | One inspectable path; refuses to stack |
| Cross-platform | Shell (Windows risk) | Node | Node (Windows-safe) |
| Cowork support | No | No | Yes |
| Verification | Sub-agent review | code-reviewer agent | Fresh-context verifier, confidence-gated |
| Verified pipelines | — | — | Yes — default-deny artifact gate + resume (unique) |
| Verified multi-agent handoffs | partial (subagent review) | partial (orchestration) | Yes — work-bus tasks gated by acceptance checks |
| Recovery | — | doctor/repair | doctor/uninstall on a tiny surface |
| Commercial | Sponsor | Pro GitHub App | None. MIT, no upsell, ever |
What you get
Core (always installed, lean — ~5 lazy-loaded skills):
whetstone— the always-on operating mode (the five laws + theprocessknob). Lean by design.brainstorm— Socratic spec refinement before any building (the single most-praised idea from superpowers, kept tight).plan— a short plan sketch; skipped entirely inlightmode.implement— execute with tests as a sharp default, escapable (--no-tdd/light); never deletes working code for ritual.review— fresh-context adversarial review via theverifier/revieweragents (confidence-thresholded, findings consolidated, security-first).
Optional packs (OFF by default, opt-in — run hone init and it picks the right ones for you):
pipeline— a verified pipeline runner for repeatable generation/build/render/data jobs. Each step must produce a real artifact (not just exit 0) before the next runs; halts and resumes instead of cascading garbage.orchestrate— Brain mode + a verified work-bus for running multiple agents (a Cowork Brain plus Claude Code, or several Cowork lanes). Includes the stoplight reply protocol: the Brain marks the human's single next action with a🔴 ACTION NEEDEDline so you never skim past the task you need to hand to the next agent. Toggle withhone bus stoplight on|off.security— an OWASP-aware code security scan (AgentShield-style, leaner).memory— session memory + continuous learning (promote lessons into reusable notes; privacy-safe, local-only).domains— a genericdeep-auditskill (audits any app) plus a_TEMPLATEfor adding your own stack-specific playbooks. Ships no private/project-specific skills.content— acontent-creationskill + a clean-room Node ComfyUI runner for image/video/3D/audio generation that verifies a real artifact came out (not just exit 0). Composes withpipelinefor multi-step jobs. Needs a local ComfyUI; ships no models or model-specific workflows.
Not sure which packs you want? Run hone init — a short setup that turns on only what
matches how you work. Quick guide: run several agents at once → orchestrate; run build/render/ETL
jobs you want gated → pipeline; generate images/video/3D/audio via ComfyUI → content; want an
on-demand security scan → security; want memory across sessions → memory; audit codebases →
domains. Everything stays off until you say yes.
Tooling: a single Node installer with doctor, uninstall, conflict detection (it refuses to
stack on an existing superpowers/ECC install — their #1 break mode), and per-skill / per-hook
disable + a context budget.
Install
One canonical path per surface. Do not mix methods. Whetstone detects conflicting installs and stops rather than create duplicates.
Portable (any model, anywhere — zero install):
Paste WHETSTONE.md into your system prompt or save it as ~/.claude/CLAUDE.md
(global) or ./CLAUDE.md (per project). That alone gives you the spine.
Claude Code (full kit, auto-updating):
/plugin marketplace add Erosshi/whetstone # or a local path to this repo
/plugin install whetstone@bhayson-whetstoneCowork: install the skills from dist/*.skill via the Save skill button — start with
whetstone.skill; add others as you want them.
hone CLI (optional, the friendly wrapper — Node 18+):
npx @erosshi/whetstone # or: npm i -g @erosshi/whetstone then hone
hone init # guided setup — picks the right packs for how you work (recommended)
hone # or just install the lean core, no questions
hone add orchestrate # enable a pack (orchestrate | pipeline | security | memory | domains)
hone doctor # health check
hone mode strict # how to set the default process mode
hone uninstall --yes # clean removalPrefer the raw scripts? node install/install.mjs --profile core|core+packs|full, node install/doctor.mjs, node install/uninstall.mjs.
See docs/INSTALL.md for details and docs/CONFIGURATION.md
for the process knob, per-skill disable, and the context budget.
Philosophy in one paragraph
A top-tier model is already capable; the gap to "senior engineer who never drops the ball" is
behavioral. So Whetstone ships the fewest rules that close that gap, makes the amount of process a
dial instead of a dogma, loads detail only when a task triggers it (so it never bloats context), and
verifies with fresh eyes instead of self-congratulation. It is honest about its ceiling: it sharpens;
it does not forge. Full reasoning in docs/PHILOSOPHY.md; the honest
head-to-head is in docs/COMPARISON.md.
For AI agents & discoverability
Agents entering this repo should read AGENTS.md; a machine-readable site map is in
llms.txt. End users: the entire operating contract is the single file
WHETSTONE.md. Installable via npx @erosshi/whetstone (the hone command) or the
Claude Code plugin marketplace.
Keywords: Claude, Claude Code, Cowork, Agent Skills, SKILL.md, Claude plugin, agent operating mode, AI workflow framework, sub-agents, code review, verification, TDD, prompt engineering, Opus, Sonnet, superpowers alternative, ECC alternative.
Contributing
Issues and PRs welcome. Keep skills lean (SKILL.md under ~120 lines), references one level deep,
descriptions third-person with trigger phrases, and behavior in what the plugin can actually ship.
See docs/CONTRIBUTING.md.
License
MIT. No telemetry, no account, no paid tier. Ever. See LICENSE.
Credits
Stands on the shoulders of obra/superpowers and affaan-m/ECC — Whetstone adopts their best ideas and is built expressly to fix the issues their own users report. Grounded in Anthropic's Prompting Claude guidance, Claude Code best practices, and the Agent Skills spec.
