@collabb/knack
v0.2.0
Published
Skill materialization workspace. Turn expertise into agents that work, in a weekend.
Readme
Knack
Turn expertise into Claude skills that work. In a weekend.
Knack is the authoring workspace for Claude skills. You give it a few exemplars (good/bad outputs with a one-line reason) and a few fixtures (test inputs). It writes the skill, evaluates it with a rubric judge panel, and sharpens it as you add more exemplars.
The wedge is the sharpening loop: every output you accept or reject becomes signal that improves the next version. The CLI + local UI make that loop frictionless.
Quickstart
export ANTHROPIC_API_KEY=sk-ant-...
npx @collabb/knack init my-skill
npx @collabb/knack uiknack ui opens a localhost dashboard at http://127.0.0.1:4242 — run the skill against real inputs, click Save as exemplar on outputs you like or dislike, hit Materialize to refine the skill, Eval to score it.
Install
npm install -g @collabb/knackOr per-project:
npm install --save-dev @collabb/knackCLI
# create a skill
knack init my-skill
# add exemplars (good/bad/forbidden examples + one-line reason)
knack add my-skill exemplar
# add fixtures (test inputs + expected behavior)
knack add my-skill fixture
# refine the skill from its exemplars (Sonnet 4.6)
# also auto-regenerates the Hammurabi rubric from exemplar tags
knack materialize my-skill
# auto-generate the rubric on demand (--refine for Sonnet 4.6 polish)
knack rubric my-skill [--refine]
# evaluate against fixtures
knack eval my-skill # Haiku 4.5 judge, single-criterion
knack eval my-skill --hammurabi # multi-criterion + judge panel + audit trail
# pairwise diff vs previous version
knack diff my-skill
# inspect
knack list
knack show my-skill
# local UI
knack ui [--port 4242]How it works
- Each skill lives in
skills/<name>/as a directory of markdown files. - Exemplars (
exemplars/*.md) are annotated good/bad/forbidden examples with a reason. - Fixtures (
fixtures/*.md) are test inputs with expected behavior. materializeruns an LLM pipeline that reads the skill + exemplars and writes a refinedskill.md. Previous version archived toversions/.evalruns the current skill against every fixture, judged by an LLM with strict pass/fail + reasoning.eval --hammurabiruns through Hammurabi's runner instead — configurable judge panel, multi-criterion weighted scoring, baseline + regression detection, full audit trail per judge.rubricauto-generates the Hammurabi rubric from the skill's exemplar tags. Recurring tags (and any tag carrying aforbidden: trueflag) become criteria with frequency-proportional weights (0.05 floor).--refineinvokes Sonnet 4.6 to polish criterion descriptions and add anchoredjudgePrompts.diffruns both versions against every fixture, then a blind pairwise judge picks the better output for each.
The local UI
knack ui boots a Hono server on 127.0.0.1 (localhost-only with layered defenses: Host allowlist + Sec-Fetch-Site allowlist + per-server CSRF). Single-user. No auth. No build step.
Routes you'll use:
- Skill list — every skill in
skills/ - Skill detail — signal card, rubric summary, latest eval verdict, exemplars + fixtures, action buttons
- Run — streaming output, then a "Save as exemplar" form with tags pre-populated from this skill's tag vocabulary
- Materialize / Rubric / Eval — one-click triggers that mirror the CLI commands
Conventions
- Skills use markdown with frontmatter.
- Anthropic-only for MVP (Sonnet 4.6 for generation, Haiku 4.5 for judging). Multi-provider dispatch coming.
- Prompt caching is applied to the skill body across every call in a session.
- File-based: no database, no server (besides the local UI). Everything is git-versionable.
Optional: Cloudflare AI Gateway
Route Claude calls through Cloudflare AI Gateway for caching/observability:
export AI_GATEWAY_URL=https://gateway.ai.cloudflare.com/v1/<account>/<gateway>/anthropicDevelopment
git clone https://github.com/collabb-innovations/knack
cd knack
npm install
npm run knack -- --help # dev mode via tsx
npm test # 59 tests, tsx --test
npm run build # compile to dist/License
MIT
