@rckflr/agent-skills-cli

v2.3.0

Published

9 days ago

Reference CLI for the agent-skills specification (spec v1.2): validate, sync, query, resolve, and exec skills end-to-end with audit log + Cloudflare Workers AI / Ollama / OpenAI / transformers.js embeddings. v2 runtime built on just-bash + just-bash-data

Downloads

1,066

0High
0Medium
0Low

rckflr

agent-skills llm-tools mcp-alternative skill-bank llms-txt

`agent-skills-cli`

Reference CLI + library for the agent-skills specification.

What this is

The agent-skills spec defines a format (SKILL.md with YAML frontmatter) and a protocol (sync/query/exec via skill banks). This CLI is the first reference implementation of the local-only operations: validate a SKILL.md against the spec, and resolve a command_template with given arg values.

The full skill-bank pipeline (sync, embed, query, audit) is delegated to runtimes like just-bash-data; this CLI focuses on the author's tools: "is my SKILL.md correct?" and "what command does it produce for these args?"

Status

v2.2.0 + spec v1.2 — filesystem allowlist (SPEC §2.11). Sandboxed banks can now grant skills read-only access to declared host directories alongside $AGENT_SCRATCH. Closes the gap that made read-file and ripgrep-search unusable in v2 sandboxed runtimes (the v1.1 sandbox restricted FS to scratch-only).

# In SKILL.md (schema_version: "0.2"):
filesystem:
  - "/etc"
  - "/var"
  - "/home"
  - "/tmp"

The runtime mounts each declared path read-only via just-bash's MountableFs + OverlayFs. Writes still go exclusively to $AGENT_SCRATCH — filesystem is a read allowlist, not a write one. Non-existent paths are skipped with a stderr warning (a portable pack can declare /etc and run on Windows where it doesn't exist; the bank just doesn't mount it).

Reference pack agent-skills-pack v2.2.0 ships read-file v2.0.0 and ripgrep-search v2.0.0 with filesystem: ["/etc", "/var", "/home", "/tmp", "/usr"]. End-to-end demonstrated against real Cloudflare Workers AI (Gemma 300M for embeddings + Granite 4.0 h-micro for skill picking + arg extraction): an LLM can now drive read-file /tmp/some-file.txt autonomously through the sandbox.

v2.1.0 + spec v1.1 — pack-distributed CustomCommands (SPEC §3.4). Skills can ship a command.js factory alongside SKILL.md that the bank loads at sync time and registers on the just-bash runtime before exec.

// skills/github-issue-create/command.js
export default ({ defineCommand }) =>
  defineCommand("gh", async (args, ctx) => {
    // ... call GitHub REST API directly using ctx.env.GH_TOKEN
    return { stdout: issue_url + "\n", stderr: "", exitCode: 0 };
  });

The factory pattern (vs exporting the Command directly) avoids the bare specifier "just-bash" resolution problem when loading the source via data: URL. The CLI's loadCustomCommandFromSource returns a structured LoadFailureReason on shape errors so pack authors get diagnosable feedback (no more silent "command not found").

Closes the gap that prevented v2 sandboxed banks from running skills that wrap host CLIs (gh, aws, kubectl). Reference pack ships github-issue-create v2.0.0 with a CustomCommand wrapping the GitHub REST API.

v0.12.0 — per-tenant audit scoping. Multi-tenant skill-bank deployments (shared CI bots, team setups, multi-user agents) get isolation: Alice's heavy use of base64-encode no longer bleeds into Bob's intent-conditional rerank.

$ agent-skills exec base64-encode --args '{"value":"x"}' --tenant alice  # records tenant
$ agent-skills query "encode something" --tenant bob                     # only Bob's history boosts

The --tenant flag (validated as ^[a-zA-Z0-9._-]{1,64}$) is supported on exec, query, and bench. Per SPEC §4.5.1: when set, the audit entry records the tenant; intent-conditional rerank filters past audits by tenant before computing the boost. When unset (the default for single-user deployments), behaviour is identical to v0.11.

Backwards-compatible: existing audit logs (no tenant field) work unchanged. The field is optional and additive.

v0.11.0+ + spec v0.2 — the spec is specified, not just documented: a 510-line Python proof-of-concept reproduces this CLI's retrieval behaviour bit-for-bit (34/35 top-1, 35/35 top-3, mean margin +0.175 — identical to 4 decimals on the canonical benchmark). The cross-implementation parity is continuously validated by e2e.yml on every push and weekly cron.

v0.11.0 — update command. Closes the obvious UX gap left by sync: how do you refresh installed packs?

$ agent-skills update                    # check every subscription
Update 2 subscription(s)

  ↑ github.com/me/pack-a@main
      a1b2c3d4e5f6 → fedcba098765
      + new-skill
      ↑ http-get: 1.0.0 → 1.1.0
      - obsolete-skill
      gc: removed 7 orphaned file(s)

  · github.com/me/[email protected]
      9876abcd1234 (no change)

summary: 1 changed, 1 unchanged

update re-resolves each subscribed ref against the host's API, re-syncs only when the SHA actually moved, garbage-collects the orphan files that older sync calls left behind on each pack, and reports a per-skill diff (added / removed / version-bumped). It inherits the subscription's verify_signature setting, so a moving tag that becomes unsigned aborts the update before any ingestion happens.

--dry-run resolves new SHAs and reports what would change without writing anything.

v0.10.0 — signed-tag verification at sync time. Closes the SECURITY.md threat model item that's been documented since v0.1: an attacker who compromises an upstream account or the host's tag-resolution path can't move a v1.0.0 tag to point at malicious code without also forging a GPG-verified signature on the tag.

Two-tier enforcement:

Always observe: every sync calls GitHub's verification API and records the result in provenance.signature_status (one of valid / invalid / unsigned / unverified). Free, zero opt-in.
Optionally enforce: pass --verify-signature (or set verify_signature: true on the subscription) and the sync aborts when status isn't valid.

$ agent-skills sync github.com/MauricioPerera/[email protected] --verify-signature
signature verification failed for github.com/MauricioPerera/[email protected]:
status=unsigned (tagger: MauricioPerera <[email protected]>, reason: unsigned).
Pass without --verify-signature to ingest unverified, or work with the publisher to
sign their tag with 'git tag -s' and re-tag.

The author side already creates signed tags via agent-skills publish --sign (v0.8.0). Now the consumer side can require them.

v0.9.0 — init command. Scaffolds a single skill or an entire skill pack from embedded templates. The output is publish-ready on the first run:

$ agent-skills init my-pack --pack --author "Alice"
init pack: my-pack

Wrote 5 file(s):
  + skills/hello-world/SKILL.md
  + llms.txt
  + README.md
  + .gitignore
  + .github/workflows/validate.yml

$ cd my-pack && agent-skills publish --check-only
Publish 1 skill(s) from .
  · hello-world                   1.0.0
summary: 0 added, 0 updated, 1 unchanged    # ← clean validation, first try

Two modes:

init <name> — adds skills/<name>/SKILL.md to an existing pack (with all spec fields commented for discoverability).
init <name> --pack — scaffolds a brand-new pack with skills/, llms.txt, README.md, .gitignore, and a CI workflow.

The scaffolded SKILL.md always validates against the spec on the first run, so agent-skills publish --check-only is green immediately. THEN the author edits.

v0.8.0 — publish command for skill-pack authors. Closes the author side of the loop:

$ cd my-skill-pack/    # has skills/foo/SKILL.md, skills/bar/SKILL.md
$ agent-skills publish --check-only
Publish 2 skill(s) from .

  · foo                          1.0.0
  · bar                          1.0.0

summary: 0 added, 0 updated, 2 unchanged
✓ skills-index.json already up-to-date

What it does: scans skills/, validates each SKILL.md against the spec, generates or updates skills-index.json (preserving hand-crafted summaries and any curated skill ordering across re-publishes), optionally creates a git tag (--tag v1.0.1, --sign for signed). Idempotent: running twice on an unchanged tree is a byte-identical no-op.

v0.7.0 — bench subcommand for reproducible retrieval evaluation. Anyone can verify the empirical claims in BENCHMARK.md against their own bank, with their own provider, and their own ground truth file:

$ agent-skills bench bench-truth.jsonl
Bench against 35 queries
  truth: bench-truth.jsonl
  model: ollama:embeddinggemma | rerank=intent-conditional | filter=on

  top-1:  34/35 (97.1%)
  top-3:  35/35 (100.0%)
  top-5:  35/35 (100.0%)
  mean top-1 score:  0.551
  mean margin (top-1 → top-2):  +0.175
  elapsed: 42306ms

Failures (1):
  ✗ "read what's at this https url"
      expected: http-get (rank 2, score 0.450)
      got:      read-file (score 0.464)

Use --json for CI integration; non-zero exit when any query fails (so a regression breaks the build).

v2.3.0 — transformers-js provider (in-process, browser-compatible). Closes the offline-first gap: the CLI's retrieval loop can now run end-to-end without a server (no Ollama daemon), without credentials, and inside a browser/Worker bundle.

npm install @huggingface/transformers   # peer dep, optional
EMBEDDING_PROVIDER=transformers-js \
TRANSFORMERS_MODEL=Xenova/all-MiniLM-L6-v2 \
agent-skills sync github.com/...

onnx-community/embeddinggemma-300m-ONNX is supported for vector-space parity with the Cloudflare @cf/google/embeddinggemma-300m provider — same weights, same retrieval results, no network egress.

v0.6.0 — multi-provider embeddings. The CLI works against:

Cloudflare Workers AI (bge-base-en-v1.5 / bge-large / bge-m3 / embeddinggemma).
Ollama (local, zero credentials, zero network egress — nomic-embed-text by default).
OpenAI / OpenAI-compatible (text-embedding-3-small/large + Together / Anyscale / Mistral / vLLM / infinity / TEI any server speaking the same /v1/embeddings shape).
transformers.js (v2.3.0+, in-process, browser-compatible, zero credentials, zero server — Xenova/all-MiniLM-L6-v2 by default).

Auto-detected from your environment (or set EMBEDDING_PROVIDER explicitly). Same loop:

sync   → pulls + embeds + indexes a skill pack from any git source
query  → finds the right skill from intent (--rerank-mode intent-conditional|global|none)
exec   → runs the resolved command via bash + appends an audit entry
audit  → inspects the local audit log

Plus local-only commands (validate, resolve) and bank management (list, reset).

Empirical claim updated (v0.5.0, live @cf/baai/bge-base-en-v1.5, 35 paraphrases × 7 skills, 50 concentrated past uses on base64-encode):

| Strategy | Top-1 | |---|---:| | Cosine baseline | 34/35 (97.1 %) | | Global rerank (the v0.4.0 stress failure) | 12/35 (34.3 %) ⚠️ | | Intent-conditional sim≥0.7 (v0.5.0 default) | 35/35 (100 %) ✓ |

The same audit log that destroys global-mode rerank (50 uses of one skill → it wins almost every query) is what makes intent-conditional rerank exceed the cosine baseline (it correctly resolves the "Basic Auth credential" ambiguity using the relevant past intents while ignoring the rest).

Full methodology, all 5 strategies compared, failure breakdowns, and operator tuning guidance: BENCHMARK.md.

Earlier validation still holds (35 paraphrases × 3 embedding models = 105 query evaluations on the public skill pack, no rerank): top-1 97–100 %, top-3 100 % across all 3 models.

Install

# Global CLI — exposes `agent-skills` on your PATH
npm install -g @rckflr/agent-skills-cli

# Or as a project library
npm install @rckflr/agent-skills-cli

v1.0 stability. STABLE library exports are semver-protected per STABILITY.md. Production use can pin to ^1.0.0 with confidence. The @rckflr/ scope is the canonical home — the unscoped agent-skills-cli name on npm is held by an unrelated project.

Or install from a GitHub release tag

For air-gapped environments, or to track main directly:

git clone --depth 1 --branch v0.18.1 https://github.com/MauricioPerera/agent-skills-cli
cd agent-skills-cli
npm install
npm run build
npm link            # exposes `agent-skills` on your PATH

Or pin a specific commit in your package.json:

"@rckflr/agent-skills-cli": "github:MauricioPerera/agent-skills-cli#v0.18.1"

Publishing your own skills?

Walk-through with concrete example: PUBLISHING.md — scaffold to public release in 20–30 minutes, including the privacy invariant.

Library API stability

The library exports are tiered (stable / experimental / internal) with explicit breaking-change rules per tier. See STABILITY.md before depending on programmatic exports in production.

Upgrading between major versions

See MIGRATION.md for per-major-version migration notes. Minor-version upgrades within a major never require migration.

Contributing / security

CONTRIBUTING.md — what to work on, how to run the dev loop, cross-implementation parity rules.
SECURITY.md — vulnerability reporting policy, in-scope vs out-of-scope. Threat-model context lives in the spec's SECURITY.md.

End-to-end demo

Pick one of the four embedding providers below. Everything else is identical.

Option A — Local with Ollama (zero credentials, zero network)

# 1. Pull an embedding model into Ollama (one-time, ~270 MB for nomic-embed-text)
ollama pull nomic-embed-text

# 2. Tell the CLI to use it (defaults to localhost:11434, model nomic-embed-text)
export EMBEDDING_PROVIDER=ollama

Option B — Cloudflare Workers AI (free tier available)

# 1. Get credentials from https://dash.cloudflare.com/profile/api-tokens
#    (token needs "Workers AI" permission)
export CF_ACCOUNT_ID=<32-hex-account-id>
export CF_API_TOKEN=<your-token>

Option C — OpenAI (or any OpenAI-compatible server)

export OPENAI_API_KEY=sk-...
# Optional: point at a compatible server (Together / Mistral / vLLM / TEI / infinity / …)
# export OPENAI_BASE_URL=https://api.together.xyz/v1

Option D — transformers.js (in-process, zero credentials, zero server, browser-compatible)

# 1. Install the optional peer dep (one-time)
npm install @huggingface/transformers

# 2. Tell the CLI to use it (defaults to Xenova/all-MiniLM-L6-v2, ~25 MB, 384-dim)
export EMBEDDING_PROVIDER=transformers-js

# Optional config
# export TRANSFORMERS_MODEL=onnx-community/embeddinggemma-300m-ONNX  # vector-space parity with @cf/google/embeddinggemma-300m
# export TRANSFORMERS_DTYPE=q4f16          # fp32|fp16|q8|q4
# export TRANSFORMERS_POOLING=mean         # mean|cls
# export TRANSFORMERS_NORMALIZE=true       # L2-normalize
# export TRANSFORMERS_CACHE_DIR=~/.cache/hf

The model downloads from Hugging Face Hub on first use (~25 MB to ~300 MB depending on choice) and is cached on disk. No daemon, no network egress at runtime, no credentials. Works in Node ≥22, Bun, Deno, Cloudflare Workers, and the browser.

Same flow regardless of provider

# 1. Sync a real skill pack (7 production-ready skills)
$ agent-skills sync github.com/MauricioPerera/[email protected]
Synced github.com/MauricioPerera/[email protected]
  ref: v1.0.0 → 4f5a2c7e9b1d3f8a6c0e7d2b5f9a1c4e8d3b6a72
  total: 7 | synced: 7 | invalid: 0 | errored: 0

  ✓ http-get
  ✓ http-post-json
  ✓ github-issue-create
  ✓ ripgrep-search
  ✓ read-file
  ✓ json-query
  ✓ base64-encode

# 3. Query by intent — agent finds the right skill on demand
$ agent-skills query "I need to fetch the contents of a webpage"
Top 5 skills for: "I need to fetch the contents of a webpage"
Embedding model: cloudflare:@cf/baai/bge-base-en-v1.5

1. [0.847] github.com/MauricioPerera/agent-skills-pack@.../http-get
   Title: HTTP GET request
   Use when: the user wants to fetch the contents of a URL...

2. [0.612] github.com/MauricioPerera/agent-skills-pack@.../http-post-json
   ...

# 4. Resolve the chosen skill (substitute args, NOT execute)
$ agent-skills resolve <(curl -fsSL https://cdn.jsdelivr.net/gh/MauricioPerera/[email protected]/skills/http-get/SKILL.md) --args '{"url":"https://example.com","timeout":15}'
curl -fsSL --max-time 15 'https://example.com'

# 5. Execute the resolved command (only if you trust the skill)
$ agent-skills resolve ... --args '...' | bash

The agent's loop is: embed intent → vector search → fetch metadata → resolve → execute. With this CLI, every step is a single command.

Commands

`agent-skills validate <file>`

Validate a SKILL.md against:

JSON Schema for the frontmatter shape (Draft 2020-12, bundled).
Spec constraints beyond the schema:
- Placeholders MUST be in argument position (no {x} inside literal "..." or '...') — SPEC §2.6.
- unquoted: true args MUST declare a strict pattern rejecting shell metacharacters.
- Every {placeholder} in command_template MUST have a corresponding args entry.

$ agent-skills validate skills/charge-customer/SKILL.md
✓ skills/charge-customer/SKILL.md is conformant

$ agent-skills validate broken-skill.md
✗ broken-skill.md has 2 error(s):
  args.x: unquoted args MUST declare a pattern (SPEC §2.6)
  command_template: placeholder '{undefined_arg}' has no corresponding entry in args

JSON output for tooling integration:

$ agent-skills validate broken-skill.md --json
{"file":"broken-skill.md","valid":false,"errors":[...]}

Exit codes:

0 — conformant
5 — non-conformant (validation error)
2 — usage error
3 — file not found

`agent-skills resolve <file> --args '<json>'`

Substitute placeholders against a JSON arg map and print the resolved bash command. Does NOT execute.

$ agent-skills resolve skills/charge-customer/SKILL.md \
    --args '{"amount":1000,"currency":"usd","customer_id":"cus_X","description":"test"}'

curl -fsSL --request POST https://api.stripe.com/v1/charges --user $STRIPE_SECRET_KEY: --data amount=1000 --data currency='usd' --data customer='cus_X' --data description='test'

Note: the $STRIPE_SECRET_KEY reference in the template is NOT substituted by the CLI — it remains a literal shell variable that bash will expand at exec time. This is the credential isolation invariant (SPEC §8 P1): secrets never enter the LLM context.

JSON output (with audit trace):

$ agent-skills resolve skills/charge-customer/SKILL.md \
    --args '{"amount":1000,"currency":"usd","customer_id":"cus_X","description":"test"}' \
    --json
{
  "file": "skills/charge-customer/SKILL.md",
  "command": "curl -fsSL ... --data amount=1000 ...",
  "trace": [
    {"name":"amount","type":"integer","rendered":"1000"},
    {"name":"currency","type":"string","rendered":"'usd'"},
    {"name":"customer_id","type":"string","rendered":"'cus_X'"},
    {"name":"description","type":"string","rendered":"'test'"}
  ]
}

resolve validates the skill before substituting (same rules as validate). Use --skip-validation to bypass for testing non-conformant skills (NOT recommended).

Pipe to bash to execute:

agent-skills resolve skills/charge-customer/SKILL.md --args '{"amount":1000,...}' | bash

(With $STRIPE_SECRET_KEY set in your environment.)

`agent-skills exec <skill> --args '<json>'` (v0.3.0+)

Resolve and execute a skill from your local bank in one step. Spawns bash -c <resolved-command>, captures stdout/stderr/exit-code, appends an audit entry. Inherits the parent process's env (so $STRIPE_SECRET_KEY, $GH_TOKEN, etc. work).

# Identify a skill by short id (must be unique across the bank) or full identity
$ agent-skills exec base64-encode --args '{"value":"hello world"}'
aGVsbG8gd29ybGQ=

# Or:
$ agent-skills exec github.com/MauricioPerera/agent-skills-pack@<sha>/http-get \
    --args '{"url":"https://example.com","timeout":10}'

Flags:

--dry-run — resolve + validate, print the command, but do not execute.
--timeout-sec N — hard timeout. Default 60s. After it fires, SIGTERM, then 2s grace, then SIGKILL, then forced resolution at +4s if the proc still hasn't reaped (Windows msys / Git Bash safety net).
--no-audit — skip the audit log entry.
--intent "<text>" — record the original user intent in the audit entry (useful when called from a query→exec pipeline).
--json — return the full result as JSON ({skill_identity, command, exit_code, stdout, stderr, elapsed_ms, timed_out, dry_run}).

The CLI's exit code matches the executed command's exit code (the agent can dispatch on $? directly).

Sensitive args are redacted in the audit log. A skill author marks an arg with sensitive: true in its args schema; the CLI substitutes "<redacted>" in the audit JSONL even though the value still flows to the bash subprocess at exec time.

`agent-skills update [<source>] [--dry-run]` (v0.11.0+)

Refresh subscribed packs from upstream. Re-resolves each subscribed ref against the host's API; re-syncs only the ones whose SHA has actually moved; garbage-collects the orphan files that older syncs left behind; reports a per-skill diff.

# Refresh all subscriptions
$ agent-skills update
Update 2 subscription(s)

  ↑ github.com/me/pack-a@main
      a1b2c3d4e5f6 → fedcba098765
      + new-skill
      ↑ http-get: 1.0.0 → 1.1.0
      - obsolete-skill
      gc: removed 7 orphaned file(s)

  · github.com/me/[email protected]
      9876abcd1234 (no change)

summary: 1 changed, 1 unchanged

# Refresh a specific subscription
$ agent-skills update github.com/me/pack-a@main

# Preview what would change without writing
$ agent-skills update --dry-run

Behaviour:

Idempotent on a stable ref. Re-running on a @v1.0.0 pinned tag is a no-op — no embedding API calls, no disk writes, no subscription updates.
GC built in. sync <repo>@<new-ref> accumulates orphan files at every SHA it ever resolved. update removes everything from this source whose SHA isn't the current one. (Sync alone doesn't — that's by design; sync is for first-install, update is for refresh.)
Inherits signature enforcement from the subscription. If the original sync used --verify-signature, that flag is persisted (verify_signature: true in the subscription record) and update reuses it — so a moving tag that becomes unsigned aborts the update before any ingestion.
--dry-run resolves new SHAs and reports what would change without writing anything.
JSON output (--json) emits the full UpdateResult struct — every per-subscription delta with added/removed/updated arrays for CI integration.

Exit code: 0 on success (even on no-op), 1 if any subscription's update failed (e.g., resolve error, signature enforcement abort).

`agent-skills audit [--limit N] [--skill <id>]`

Inspect the local audit log (append-only JSONL at <bank>/audit.jsonl).

$ agent-skills audit --limit 5
2026-04-28T06:18:02.048Z  ✓  github.com/MauricioPerera/agent-skills-pack@.../read-file  (124ms)
2026-04-28T06:18:00.444Z  ✓  github.com/MauricioPerera/agent-skills-pack@.../base64-encode  (232ms)
    intent: encode hello as base64

Filter by skill identity to see one tool's usage history:

$ agent-skills audit --skill github.com/.../base64-encode --json | jq '.[] | {timestamp, exit_code, args}'

The audit log is local only (per SPEC §8 P3) — it never leaves your machine.

`agent-skills bench <truth-file>` (v0.7.0+)

Measure top-K retrieval accuracy against a ground-truth file of (intent, expected_skill_id) pairs. Same code path as query, so the numbers match what the agent would actually see.

Truth file format (auto-detected by first non-whitespace char):

# JSONL — one entry per line, blank lines and # comments OK
{"intent": "fetch the contents of a URL", "expected": "http-get"}
{"intent": "encode a string as base64", "expected": "base64-encode"}

[
  { "intent": "fetch the contents of a URL", "expected": "http-get" },
  { "intent": "encode a string as base64", "expected": "base64-encode" }
]

expected is the skill's short id (the frontmatter id field), not the full identity — so truth files stay portable across skill-pack revisions. The bench fails fast if any expected doesn't resolve to exactly one installed skill.

Output:

$ agent-skills bench bench-truth.jsonl
Bench against 35 queries
  truth: bench-truth.jsonl
  model: ollama:embeddinggemma | rerank=intent-conditional | filter=on

  top-1:  34/35 (97.1%)
  top-3:  35/35 (100.0%)
  top-5:  35/35 (100.0%)
  mean top-1 score:  0.551
  mean margin (top-1 → top-2):  +0.175
  elapsed: 42306ms

Failures (1):
  ✗ "read what's at this https url"
      expected: http-get (rank 2, score 0.450)
      got:      read-file (score 0.464)

Flags:

--k N — report top-K (default 5). top-1 and top-3 always shown.
--rerank-mode <mode> — same options as query: intent-conditional (default) | global | none.
--no-rerank — shortcut for --rerank-mode none.
--no-filter — disable applicable_when filtering during the bench.
--embedding-provider <p> — override env auto-detect.
--json — emit a machine-readable result (every per-query row + summary).

CI integration: the CLI exits non-zero when failures.length > 0, so a regression breaks the build.

agent-skills bench bench-truth.jsonl --json > result.json
[ "$(jq '.top1' result.json)" -ge 35 ] || exit 1

The public skill pack ships its own truth file at agent-skills-pack/bench-truth.jsonl — 35 paraphrases × 7 skills.

`agent-skills init <name> [--pack] [--in <dir>]` (v0.9.0+)

Scaffolds a new skill (single mode) or a complete skill pack (--pack mode) from embedded templates. The generated SKILL.md is publish-ready on the first run — agent-skills publish --check-only is green immediately. Then the author edits.

Single-skill mode (use inside an existing pack):

$ cd my-skill-pack/
$ agent-skills init scrape-website
init skill: my-skill-pack

Wrote 1 file(s):
  + skills/scrape-website/SKILL.md

Next:
  edit skills/scrape-website/SKILL.md
  agent-skills publish --check-only   # validate + refresh skills-index.json

Pack mode (start a brand-new skill pack):

$ agent-skills init my-pack --pack --author "Alice"
init pack: my-pack

Wrote 5 file(s):
  + skills/hello-world/SKILL.md
  + llms.txt
  + README.md
  + .gitignore
  + .github/workflows/validate.yml

Next:
  cd my-pack
  agent-skills publish --check-only   # should be a clean validation
  git init && git add . && git commit -m "Initial pack"
  # Then add the GitHub topic 'agent-skills' and publish a tagged release.

The scaffolded hello-world skill is a working echo wrapper with a strict pattern arg. It's intentionally minimal so the author has the smallest possible diff to make it real.

Discoverability surface: the generated SKILL.md includes every optional frontmatter field commented out with a one-line explanation per field — applicable_when, network, examples, tags, category, idempotent, chain. Authors discover the spec by editing the scaffold rather than reading SPEC.md.

Flags:

--pack — scaffold a whole pack at ./<name>/ instead of a single skill at ./skills/<name>/.
--in <dir> — root directory. Default: . (cwd).
--author <name> — inject as author.name (single mode) and the README/llms.txt headers (pack mode).
--force — overwrite existing files. Default: refuse.

Refusing to overwrite is per-file: re-running init on a partially-built pack fills in only the files that don't exist yet, leaving your edits alone.

`agent-skills publish [<dir>]` (v0.8.0+)

Author-side command. Validates every SKILL.md under <dir>/skills/, generates or updates <dir>/skills-index.json, and optionally creates a signed git tag.

$ cd my-skill-pack/
$ agent-skills publish
Publish 7 skill(s) from .

  · http-get                     1.0.0
  · http-post-json               1.0.0
  + new-skill                    1.0.0   (added)
  ↑ ripgrep-search               2.0.0   (updated)
  · …

  Removed (in old index, not on disk):
    - obsolete-skill

summary: 1 added, 1 updated, 5 unchanged, 1 removed
✓ wrote skills-index.json

Status glyphs:

+ added (new on disk, not in old index)
↑ updated (version, url, or summary changed)
· unchanged (matched the existing index byte-for-byte)
✗ invalid (validation failed — index NOT written)
! error (parse error or unreadable file)

Flags:

--check-only — validate but don't write or tag. CI-friendly.
--tag <version> — create git tag at HEAD (e.g., v1.0.1).
--sign — signed tag (git tag -s). Requires GPG configured.
--repo <repo> — first-publish only: set default_source.repo (e.g., github.com/me/pack).
--branch <name> — first-publish only: set default branch.
--ref <ref> — embed this ref in resolved skill URLs (e.g., v1.0.0).
--json — machine-readable result.

Behaviour:

Hand-crafted summaries preserved across re-publishes. New skills get an auto-generated summary from description; existing skills keep whatever summary is in the index.
Curated skill ordering preserved. publish walks the existing index's order first, appending only newly-discovered skills (alphabetically among themselves) at the end.
Idempotent. Running publish twice on an unchanged tree is a byte-identical no-op — no timestamps, no reordering. Safe in pre-commit hooks.
Fail-fast. If any SKILL.md is invalid, the index is NOT written and the command exits with code 5.

CI integration (e.g., GitHub Actions):

- name: Validate skill pack
  run: |
    npx --yes agent-skills-cli@latest publish --check-only --json > pub.json
    [ "$(jq '.invalid + .errored' pub.json)" -eq 0 ] || exit 1

Library usage

The CLI is also a library, importable in your own TypeScript:

import {
  parseSkillSource,
  validateSkill,
  resolveCommand,
  parseIdentity,
  deriveUrls,
} from "agent-skills-cli";

// Parse + validate
const skill = parseSkillSource(fileContent);
const validation = validateSkill(skill.frontmatter);
if (!validation.valid) {
  console.error(validation.errors);
  process.exit(5);
}

// Resolve a command
const result = resolveCommand(skill.frontmatter, { amount: 1000, currency: "usd" });
console.log(result.command);

// Parse a skill identity
const id = parseIdentity("github.com/stripe/[email protected]/charge-customer");
const urls = deriveUrls(id);
// → ["https://cdn.jsdelivr.net/gh/stripe/agent-skills@.../charge-customer/SKILL.md", ...]

Full type definitions are exported. See src/types.ts.

Roadmap

| Version | Status | Scope | |---|---|---| | v0.2.0-alpha | shipped | validate + resolve (local-only) + library API | | v0.2.0 | shipped | + sync + query + list + reset; Cloudflare Workers AI embeddings | | v0.3.0 | shipped | + exec (bash subprocess + 3-stage kill ladder + sensitive-arg redaction) + audit (append-only JSONL log); closes agent loop end-to-end | | v0.4.0 | shipped | + audit-based rerank (α·log(1+usage) + recency); applicable_when host detection; 5-strategy benchmark exposing global-rerank failure mode | | v0.5.0 | shipped | + intent-conditional rerank as default (IntentEmbeddingCache + sim≥0.7 filter); fixes the 50-use stress failure with 100 % top-1 on live Workers AI | | v0.6.0 | shipped | + Ollama (local, zero-credential) + OpenAI / OpenAI-compatible (Together / vLLM / TEI / infinity / …) embedding providers; auto-detect from env or EMBEDDING_PROVIDER flag | | v0.6.1 | shipped | + 4 code-review patches (docstring, pure-Node PATH scan, ENOENT discrimination, bounded-concurrency sync) | | v0.7.0 | shipped | + bench subcommand: reproducible top-K accuracy against a JSONL/JSON-array truth file. CI integration via JSON output + non-zero exit on any failure | | v0.8.0 | shipped | + publish command: validate skills/, generate skills-index.json (preserves hand-crafted summaries + curated ordering), optionally git-tag | | v0.9.0 | shipped | + init command: scaffold a single skill or full pack from embedded templates. Output validates against the spec on first run | | v0.10.0 | shipped | + signed-tag verification at sync time. Closes the SECURITY.md tag-tampering threat model | | v0.11.0 | shipped | + update command: re-resolve subscribed refs, re-sync only on movement, GC orphan files from old SHAs, per-skill diff | | v0.12.0 | shipped | + per-tenant audit scoping (--tenant <id> on exec/query/bench); intent-conditional rerank filters by tenant; SPEC §4.5.1 | | v0.13.0 | shipped | Cleanup + correctness: listSkills caching (perf), remove deprecated rerank_applied field, fix update.ts GC for multi-subscription edge case | | v0.13.1 | shipped | + listAudit caching (closes the last perf debt from the post-v0.11 code review). 363/363 tests; 8/10 review issues fixed | | v0.13.2 | shipped | Docs alignment patch: stale install/version refs across READMEs, init scaffolded CI workflow now pins v0.13.1, BENCHMARK.md updated. Plus a hotfix for a CI type-check break introduced in v0.13.1 | | v0.13.3 | shipped | + gc_protected counter in update results (visibility into multi-sub GC); narrowed Subscription.source_type from "git" \| "url" to "git". 10/10 review issues fixed | | v0.14.0 | shipped | + Sigstore signature detection (structural via PEM header — gpg vs sigstore / gitsign). Full Rekor inclusion-proof verification (Level 4 enforcement) was queued at the time; later parked at v1.0 pending external pull. Cross-impl parity (TS + Python) maintained in CI | | v0.15.0 | shipped | + SSH-tag detection (-----BEGIN SSH SIGNATURE----- → "ssh") — fixes a v0.14 oversight. Spec patch v0.3.2: documents the Sigstore-on-host trap | | v0.16.0 | shipped | + Sigstore identity extraction: hand-rolled CMS/ASN.1 walker pulls the OIDC subject from the Fulcio cert's SAN and the issuer from extension 1.3.6.1.4.1.57264.1.1 (/.1.8). Surfaced as provenance.signature_identity = { subject, subject_type, issuer }. Zero new deps. Cross-impl parity validated against the real sigstore/[email protected] payload. Spec patch v0.3.3 | | v0.17.0 | shipped | + Rekor entry parsing + public-instance pinning + lookup by UUID (parseRekorEntry, fetchRekorEntry, RekorEntry types). Real fixture committed for offline testing. Spec patch v0.4.0: formalizes the Level 4 verification contract (§5.4) — what banks must do, not how | | v0.17.1 | shipped | + computeGitsignRekorLookupHash (CMS SignerInfo.SignedAttrs marshaled-for-verification per RFC 5652 §5.4 / gitsign source) + findRekorEntryByHash wrapper for Rekor's index/retrieve. Validated structurally via the messageDigest invariant. Spec v0.4.1 §5.4.2 step 3 codifies the framing math + the Rekor shard-rotation caveat | | v0.18.0 | tagged | Intended npm publication; tagged but never published — package.json declared scope @mauricioperera which doesn't exist on npm under the available auth. v0.18.1 corrects to @rckflr | | v0.18.1 | shipped | First npm publication as @rckflr/agent-skills-cli. Removes the install friction (git clone && npm link → npm install -g). Adoption blocker resolved without waiting for v1.0. No code changes vs v0.17.1 | | v0.18.2 | tagged-only | + release.yml GitHub Action (OIDC trusted publisher + provenance attestation). Tag exists in git, workflow fired, but final PUT to npm registry returned 404 because npm-side Trusted Publisher config isn't in place yet. Kept as historical marker; provenance attestation for this attempt lives at Rekor logIndex 1398755825 | | v0.18.3 | shipped | Published via granular-token-in-~/.npmrc (the standard pattern most solo npm publishers use; token rotates every 90 days). OIDC pipeline (release.yml) is in place but ran into an unresolved interaction between Trusted Publisher config and account-level 2FA-for-writes. Future releases retry OIDC first | | v0.19.0 | shipped | API stability tiers formalized. New STABILITY.md with the breaking-change policy per tier (stable / experimental / internal). Public exports in src/index.ts reorganized by tier with section headers. No API removals — purely annotations + policy | | v1.0.0 | shipped | Stable API freeze. MIGRATION.md. Companion spec also bumped to v1.0. From here: STABLE exports protected by semver + 6-month deprecation window | | v2.0.0 | shipped | Runtime + storage migrated to spec-conformant stack. Pre-v2 the CLI ran a parallel non-conformant runtime (Node child_process to host shell, JSON files on disk); v2 closes that divergence by following IMPLEMENTATION.md literally — execution via sandboxed just-bash, storage via just-bash-data db + vec. Skills now execute with FS scratch dir, network allowlist enforcement, and env-var scoping per SPEC §4.4. Breaking changes: existing banks need re-sync; skills that wrap host binaries (gh, aws, kubectl, …) require CustomCommand wrappers. See MIGRATION.md | | post-2.0 | reactive | Item-by-item, driven by external pull: (a) Level 4 Rekor verification crypto if a Sigstore-signing publisher appears; (b) CustomCommand wrappers for popular host CLIs (gh, etc.) so the public pack works under the sandbox; (c) ecosystem growth via second/third external pack publishers (OUTREACH.md). None blocked on the maintainer |

Continuous validation

Two workflows guard against regression:

ci.yml — type-check, build, test on every push and PR. Fast (under 2 minutes), no external dependencies.
e2e.yml — ecosystem coherence. Two jobs:
1. cross-impl-parity: clones this CLI + agent-skills-py-proof + agent-skills-pack, installs Ollama with all-minilm, runs bench against the same truth file with both implementations, fails if their numerical results diverge. Catches silent breakage when a sister repo evolves in a way that drifts retrieval.
2. author-roundtrip: init a new pack, publish --check-only (must validate clean), resolve a scaffolded skill (must produce echo 'hello world'). Validates that the author tooling produces output the consumer tooling can ingest.

Runs on every push, on PR, and weekly via cron — the cron picks up sister-repo changes within 7 days even when this repo is quiet.

Sister projects

agent-skills — the canonical specification (v0.3.0).
agent-skills-pack — example skill pack with 7 production-ready skills + bench-truth.jsonl. Integration test corpus for this CLI.
agent-skills-py-proof — 510-line Python implementation that produces bit-identical retrieval scores to this CLI. Cross-implementation validation of the spec.
just-bash-data — alternative storage runtime providing db (document store) + vec (vector search) primitives.

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

agent-skills-cli

What this is

Status

Install

Or install from a GitHub release tag

Publishing your own skills?

Library API stability

Upgrading between major versions

Contributing / security

End-to-end demo

Option A — Local with Ollama (zero credentials, zero network)

Option B — Cloudflare Workers AI (free tier available)

Option C — OpenAI (or any OpenAI-compatible server)

Option D — transformers.js (in-process, zero credentials, zero server, browser-compatible)

Same flow regardless of provider

Commands

agent-skills validate <file>

agent-skills resolve <file> --args '<json>'

agent-skills exec <skill> --args '<json>' (v0.3.0+)

agent-skills update [<source>] [--dry-run] (v0.11.0+)

agent-skills audit [--limit N] [--skill <id>]

agent-skills bench <truth-file> (v0.7.0+)

agent-skills init <name> [--pack] [--in <dir>] (v0.9.0+)

agent-skills publish [<dir>] (v0.8.0+)

Library usage

Roadmap

Continuous validation

Sister projects

License

`agent-skills-cli`

`agent-skills validate <file>`

`agent-skills resolve <file> --args '<json>'`

`agent-skills exec <skill> --args '<json>'` (v0.3.0+)

`agent-skills update [<source>] [--dry-run]` (v0.11.0+)

`agent-skills audit [--limit N] [--skill <id>]`

`agent-skills bench <truth-file>` (v0.7.0+)

`agent-skills init <name> [--pack] [--in <dir>]` (v0.9.0+)

`agent-skills publish [<dir>]` (v0.8.0+)