npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@cristobalme/skill-test

v0.1.0

Published

Test agent Skills (SKILL.md): static lint, activation triggering, and behavioral grading. Zero-config, offline-capable, CI-first.

Readme

skill-test

skills tested

Test agent Skills (SKILL.md files) before you ship them. skill-test validates skills across three layers:

  1. Static lint — offline, deterministic checks against the live Agent Skills spec: frontmatter, naming rules, description length, body size, broken file references, and risky instruction patterns.
  2. Triggering — does the agent actually load your skill for the prompts it should (and skip the ones it shouldn't)? Measured as precision/recall over a labeled corpus.
  3. Behavioral — does the skill produce correct output on real tasks? (graded, sandboxed)

No telemetry. No phone-home. The static layer needs no network and no API key.

Quick start

npx @cristobalme/skill-test lint ./my-skill

The package is published under the @cristobalme scope; the binary it installs is named skill-test.

Commands

skill-test lint    <path...>   # Layer 1 — static, offline, deterministic
skill-test trigger <path...>   # Layer 2 — activation precision/recall (needs API key + spec)
skill-test run     <path...>   # Layer 3 — behavioral task grading
skill-test check   <path...>   # Runs every layer available given config/keys

<path> accepts a single SKILL.md, a skill directory, or a directory of many skills (walked recursively).

Global flags

| Flag | Effect | | ---------------- | --------------------------------------------------------------- | | --json | Emit machine-readable JSON to stdout | | --junit <file> | Write a JUnit XML report to <file> (renders in CI dashboards) | | --cheap | Skip the behavioral (run) layer | | --quiet | Only print failures | | --no-color | Disable ANSI color (also auto-off when stdout isn't a TTY) | | --model <id> | Override the classifier model for the trigger layer |

Exit codes

| Code | Meaning | | ---- | ---------------------------------------------------------------------- | | 0 | All checks that ran passed | | 1 | One or more failures (lint error, or trigger false pos/negatives) | | 2 | Usage or configuration error (no skill found, trigger without a key) |

Layers degrade gracefully: check runs lint always, and runs the trigger layer only when an ANTHROPIC_API_KEY and a SKILL.test.yaml are present. A missing key or spec is skipped, not failed — so check is safe to drop into any CI.

The SKILL.test.yaml spec

Co-locate a SKILL.test.yaml next to your SKILL.md to enable the trigger layer:

skill: ./SKILL.md
triggering:
  should_activate:
    - "fill out this PDF form"
    - "complete the application pdf"
  should_not_activate:
    - "write me a poem"
    - "summarize this spreadsheet"
tasks: [] # behavioral tasks — Phase 5

The trigger layer asks the model to make the same load/skip decision a host agent makes at startup, using only the skill's name + description (never the body). It reports precision/recall/F1 over the labeled prompts. Results are cached on disk (keyed by model + description + prompt), so reruns are free.

GitHub Action

Test every skill on each PR and get a results comment. Drop this into .github/workflows/skill-test.yml (full copy in examples/skill-test.yml):

name: skill-test
on: [pull_request]
permissions:
  contents: read
  pull-requests: write
jobs:
  skill-test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with: { fetch-depth: 0 }
      - uses: OWNER/skill-test/action@v1
        with:
          path: .
          # optional — enables the trigger layer; lint runs without it
          anthropic-api-key: ${{ secrets.ANTHROPIC_API_KEY }}

Replace OWNER with the org/user the action is published under. Without an API key the Action runs the offline lint layer and still posts a comment. With one, it adds activation precision/recall. It posts (and updates in place) a single PR comment:

| Skill | Lint | Trigger | | -------------- | ----------- | ------------------ | | good-skill | ✅ | ✅ P 100% · R 100% | | broken-skill | ❌ 2 errors | ⏭️ skipped |

Add the badge

[![skills tested](https://img.shields.io/badge/skills-tested-8A2BE2)](https://www.npmjs.com/package/@cristobalme/skill-test)

Privacy

No telemetry, no analytics, no phone-home. The static lint layer runs fully offline. Only trigger and run call the Anthropic API, and only with the metadata/inputs needed for the check.

Status

v0.1.0 ships the static lint layer, the activation trigger layer, the unified check with JSON/JUnit output, and the GitHub Action + badge. The behavioral run layer (sandboxed task grading) is the next release.

License

MIT