@collabb/knack

v0.2.0

Published

a day ago

Skill materialization workspace. Turn expertise into agents that work, in a weekend.

0High
0Medium
0Low

shmuel-ma

idkhbu

claude skills agent anthropic llm eval rubric ai

Knack

Turn expertise into Claude skills that work. In a weekend.

Knack is the authoring workspace for Claude skills. You give it a few exemplars (good/bad outputs with a one-line reason) and a few fixtures (test inputs). It writes the skill, evaluates it with a rubric judge panel, and sharpens it as you add more exemplars.

The wedge is the sharpening loop: every output you accept or reject becomes signal that improves the next version. The CLI + local UI make that loop frictionless.

Quickstart

export ANTHROPIC_API_KEY=sk-ant-...
npx @collabb/knack init my-skill
npx @collabb/knack ui

knack ui opens a localhost dashboard at http://127.0.0.1:4242 — run the skill against real inputs, click Save as exemplar on outputs you like or dislike, hit Materialize to refine the skill, Eval to score it.

Install

npm install -g @collabb/knack

Or per-project:

npm install --save-dev @collabb/knack

CLI

# create a skill
knack init my-skill

# add exemplars (good/bad/forbidden examples + one-line reason)
knack add my-skill exemplar

# add fixtures (test inputs + expected behavior)
knack add my-skill fixture

# refine the skill from its exemplars (Sonnet 4.6)
# also auto-regenerates the Hammurabi rubric from exemplar tags
knack materialize my-skill

# auto-generate the rubric on demand (--refine for Sonnet 4.6 polish)
knack rubric my-skill [--refine]

# evaluate against fixtures
knack eval my-skill                # Haiku 4.5 judge, single-criterion
knack eval my-skill --hammurabi    # multi-criterion + judge panel + audit trail

# pairwise diff vs previous version
knack diff my-skill

# inspect
knack list
knack show my-skill

# local UI
knack ui [--port 4242]

How it works

Each skill lives in skills/<name>/ as a directory of markdown files.
Exemplars (exemplars/*.md) are annotated good/bad/forbidden examples with a reason.
Fixtures (fixtures/*.md) are test inputs with expected behavior.
materialize runs an LLM pipeline that reads the skill + exemplars and writes a refined skill.md. Previous version archived to versions/.
eval runs the current skill against every fixture, judged by an LLM with strict pass/fail + reasoning.
eval --hammurabi runs through Hammurabi's runner instead — configurable judge panel, multi-criterion weighted scoring, baseline + regression detection, full audit trail per judge.
rubric auto-generates the Hammurabi rubric from the skill's exemplar tags. Recurring tags (and any tag carrying a forbidden: true flag) become criteria with frequency-proportional weights (0.05 floor). --refine invokes Sonnet 4.6 to polish criterion descriptions and add anchored judgePrompts.
diff runs both versions against every fixture, then a blind pairwise judge picks the better output for each.

The local UI

knack ui boots a Hono server on 127.0.0.1 (localhost-only with layered defenses: Host allowlist + Sec-Fetch-Site allowlist + per-server CSRF). Single-user. No auth. No build step.

Routes you'll use:

Skill list — every skill in skills/
Skill detail — signal card, rubric summary, latest eval verdict, exemplars + fixtures, action buttons
Run — streaming output, then a "Save as exemplar" form with tags pre-populated from this skill's tag vocabulary
Materialize / Rubric / Eval — one-click triggers that mirror the CLI commands

Conventions

Skills use markdown with frontmatter.
Anthropic-only for MVP (Sonnet 4.6 for generation, Haiku 4.5 for judging). Multi-provider dispatch coming.
Prompt caching is applied to the skill body across every call in a session.
File-based: no database, no server (besides the local UI). Everything is git-versionable.

Optional: Cloudflare AI Gateway

Route Claude calls through Cloudflare AI Gateway for caching/observability:

export AI_GATEWAY_URL=https://gateway.ai.cloudflare.com/v1/<account>/<gateway>/anthropic

Development

git clone https://github.com/collabb-innovations/knack
cd knack
npm install
npm run knack -- --help    # dev mode via tsx
npm test                   # 59 tests, tsx --test
npm run build              # compile to dist/

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme