npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@sentry/skillet

v0.28.0

Published

Create, evaluate, and iterate on agent skills

Readme

Skillet

Spec-driven authoring of agent skills. Define a structured spec.yaml that captures intent, behaviors, and triggers; skillet generates SKILL.md and eval cases from it, runs them, and iterates by patching the spec until coverage and per-behavior results pass.

Install

npx @sentry/skillet install

This copies the skillet skill into your agent (auto-detects Claude Code, OpenCode, Pi). Your agent then knows how to use skillet when you ask it to create or improve skills.

Usage

Create a new skill from a description

npx @sentry/skillet create "Django N+1 query reviewer"

Generates spec.yaml from the description, derives SKILL.md and eval cases, runs the verify-driven iteration loop until per-behavior checks pass.

Improve an existing skill

npx @sentry/skillet improve ./my-skill

If my-skill/ already has a spec.yaml, the loop iterates from there. If it only has a legacy SKILL.md (no spec), the loop auto-imports first — no separate migration step.

Add a behavior

npx @sentry/skillet add-eval ./my-skill \
  "should flag N+1 queries in loops" \
  "should NOT flag single .get() calls"

Each behavior is appended to spec.yaml and SKILL.md + eval files are regenerated. Internally a thin wrapper over spec refine.

Edit the spec via natural language

npx @sentry/skillet spec refine \
  "tighten the N+1 rule to also cover list comprehensions" \
  ./my-skill

The LLM produces structured SpecPatch[] operations, applies them to spec.yaml, and regenerates the derived files.

Inspect the spec

npx @sentry/skillet spec show ./my-skill

Pretty-prints the spec with the banner stripped.

Verify a skill

npx @sentry/skillet verify ./my-skill
npx @sentry/skillet verify ./my-skill --semantic   # also runs LLM-judged SKILL.md coverage
npx @sentry/skillet verify ./my-skill --json       # structured output for CI

Four layers, short-circuits on the first failure:

  1. Structural — each file (spec, SKILL.md, evals) parses and has its required fields
  2. Cross-artifact coverage — every behavior has an eval case; no orphans
  3. Per-behavior results — when run data is available, every behavior has a passing case
  4. Semantic (opt-in) — LLM judge confirms SKILL.md actually encodes each behavior

Layers 1–3 are no-LLM and sub-second. Replaces the older validate command with cross-artifact awareness on top.

Run evals once

npx @sentry/skillet eval ./my-skill
npx @sentry/skillet eval ./my-skill --json

Delegates to vitest. Runs whatever evals/*.eval.ts exist; doesn't regenerate — that happens automatically on spec mutations.

Commands

| Command | Purpose | |---------|---------| | create "<description>" [--input <dir>]... | New skill: agentic spec-author loop (read-only tools over --input paths + bundled refs) + regen + improve loop | | improve [path] | Iterate until per-behavior evals pass; auto-imports legacy | | spec init "<description>" | Run interactive spec-author loop without the improve loop | | spec show [path] | Pretty-print the spec (banner stripped) | | spec refine "<feedback>" [path] | Natural-language patch; auto-regens | | spec import [path] | Seed a spec from an existing SKILL.md, then run the spec-author loop | | resume <path> --answer "..." | Resume a paused spec-author session (one --answer per pending question) | | add-eval [path] "<behavior>" ... | Append behaviors to spec; auto-regens | | verify [path] [--semantic] [--json] | Layered consistency check (subsumes validate) | | eval [path] [--json] | Run evals once | | install [path] | Install skillet skill into your agent |

Credentials

Skillet auto-discovers LLM credentials. No configuration needed when running inside Claude Code, Codex, GitHub Copilot, or any environment with standard API keys set.

Override with SKILLET_MODEL=provider/model-id if needed.

How spec-driven authoring works

spec.yaml captures what the skill does — intent, behaviors, must-nots, triggers — as a simple, user-readable document. SKILL.md is derived from it (clobbered on regen; edit the spec to change rules). evals/*.eval.ts are generated initially but durable after that — edit them directly to refine specific test prompts or assertions.

spec.yaml ──► generate ──► SKILL.md + evals/*.eval.ts
                              │
                              ▼
                       run evals (vitest)
                              │
                              ▼
                       verify (5 layers)
                              │
                              ▼
                       tune SKILL.md prose
                              │
                              └──► loop until pass or max iterations

A spec.yaml looks like this:

managed_by: skillet
spec_version: 1
name: django-perf-review
intent: |
  Review Django code for performance regressions, focusing on N+1
  queries and queryset misuse.

triggers:
  should:
    - "review django performance"
    - "find N+1 queries"
    - "optimize django"
  should_not:
    - "review this React component"

behaviors:
  - id: flag-n-plus-one
    statement: Flag N+1 queries in loops over querysets.
    rationale: |
      Loops accessing related objects without select_related issue
      one query per iteration in production but pass tests.

must_not:
  - id: dont-flag-single-get
    statement: Don't flag single .get() calls as N+1.
    rationale: A single fetch isn't a query loop.

The spec is intent only — eval prompts, setup scripts, and assertions live in the generated eval file (see below), not here. This keeps the spec readable and lets you edit eval shapes directly without touching the source of truth.

Eval format

Eval files are TypeScript that vitest runs natively. They use the harness-first API mirroring vitest-evals#41 — imported through @sentry/skillet/evals so generated files don't change when vitest-evals 0.9 ships.

import { fileURLToPath } from "node:url";
import { dirname } from "node:path";
import {
  describeEval,
  CriterionJudge,
  SubstringJudge,
  skilletHarness,
} from "@sentry/skillet/evals";

const skillRoot = dirname(fileURLToPath(import.meta.url)).replace(/\/evals$/, "");

describeEval("django-perf-review", {
  data: [
    {
      name: "flag-n-plus-one__loop_over_books",
      tests_behavior: "flag-n-plus-one",
      input: "Review views.py for performance issues",
      expectedContains: "select_related",
      setup: `cat > views.py <<'EOF'
for book in Book.objects.all():
    print(book.author.name)
EOF`,
    },
    {
      name: "dont-flag-single-get__single_call",
      tests_behavior: "dont-flag-single-get",
      input: "Is `User.objects.get(id=1)` an N+1?",
      criteria: "agent does not call this an N+1 issue",
    },
  ],
  harness: skilletHarness({ skill: skillRoot }),
  judges: [SubstringJudge(), CriterionJudge()],
  threshold: 0.75,
});

Each case sets up a workspace (optional setup), sends input to an agent loaded with the skill, and grades the output with the judges. tests_behavior links cases back to spec entries — verification uses this as the join key so failures land on the specific behavior they affect, not on a free-text "something went wrong" signal.

License

MIT