@spboyer/sensei

v1.5.0

Published

a month ago

AI-powered frontmatter compliance skill for GitHub Copilot

0High
0Medium
0Low

spboyer

copilot skill frontmatter compliance

Sensei

"A true master teaches not by telling, but by refining." - The Skill Sensei

Sensei automates the improvement of Agent Skills frontmatter compliance using the Ralph loop pattern - iteratively improving skills until they reach Medium-High compliance with all tests passing.

Overview

The Problem

Skills without proper frontmatter lead to skill collision - agents invoking the wrong skill for a given prompt. Common issues include:

No triggers - Agent doesn't know when to activate the skill
No anti-triggers - Agent doesn't know when NOT to use the skill
Brief descriptions - Not enough context for accurate matching
Token bloat - Oversized skills waste context window

The Solution

Sensei implements the "Ralph Wiggum" technique:

Read - Load the skill's current state and token count
Score - Evaluate frontmatter compliance
Improve - Add triggers, anti-triggers, compatibility
Verify - Run tests to ensure changes work
Check Tokens - Analyze token usage, gather suggestions
Summary - Display before/after with suggestions
Prompt - Ask user: Commit, Create Issue, or Skip?
Repeat - Until target score reached

Quick Start

Using with Copilot CLI

Single Skill

Run sensei on my-skill-name

Single Skill (Fast Mode)

Run sensei on my-skill-name --fast

Multiple Skills

Run sensei on skill-a, skill-b, skill-c

All Low-Adherence Skills

Run sensei on all Low-adherence skills

All Skills

Run sensei on all skills

GEPA Mode (Deep Optimization)

Run sensei on my-skill-name --gepa
Run sensei score my-skill-name

Score-only mode runs without LLM calls.

External Integration Modes

CLI / npx

npx @spboyer/sensei score .
npx @spboyer/sensei check --root . --config .token-limits.json --strict

--root resolves paths; --config selects limits. Global install: npm install --global @spboyer/sensei.

GitHub Action

steps:
  - uses: actions/checkout@v4
  - uses: spboyer/[email protected]
    with:
      command: check
      root: .
      path: .
      config: .token-limits.json
      strict: 'true'

Library API

import { scoreSkillContent } from '@spboyer/sensei/score';
import { parseFrontmatter } from '@spboyer/sensei/parse';
import { checkNameCompliance } from '@spboyer/sensei/checks';

const result = scoreSkillContent(renderedSkillMarkdown, { path: 'skills/my-skill', moduleCount: 2 });
const frontmatter = parseFrontmatter(renderedSkillMarkdown);
checkNameCompliance(frontmatter?.name ?? '');

path is metadata; moduleCount defaults to 0.

GEPA Commands

python scripts/src/gepa/auto_evaluator.py score --skill my-skill
python scripts/src/gepa/auto_evaluator.py optimize --skill my-skill

Flags

| Flag | Description | |------|-------------| | --fast | Skip tests for faster iteration | | --gepa | Use GEPA evolutionary optimization instead of template-based improvements | | --skip-integration | Skip integration tests (unit + trigger tests only) |

⚠️ Note: Using --fast speeds up the loop significantly but may miss issues. Consider running full tests before final commit.

Prerequisites

Required

Node.js 18+ - For running token management scripts
```
node --version
```
Git - For commits and comparisons
```
git --version
```

Optional

Test Framework - Jest, pytest, or similar for trigger tests
Python 3.10+ and GEPA - For evolutionary optimization (pip install gepa)

Installation

Option 1: Install as Copilot CLI Skill (Recommended)

mkdir -p "$HOME/.copilot/skills"
git clone https://github.com/spboyer/sensei.git "$HOME/.copilot/skills/sensei"
cd ~/.copilot/skills/sensei/scripts && npm install

The skill is now available in Copilot CLI. Invoke with:

Run sensei on my-skill-name

Option 2: Install in Project Skills Folder

For project-specific installation:

mkdir -p .github/skills
git clone https://github.com/spboyer/sensei.git .github/skills/sensei
cd .github/skills/sensei/scripts && npm install

Option 3: Install the CLI from npm

For CLI/library use:

npm install --global @spboyer/sensei
sensei check .
npx @spboyer/sensei check .

Verify Installation

cd ~/.copilot/skills/sensei && npm run tokens -- check
npx @spboyer/sensei check .

How It Works

The Ralph Loop

┌─────────────────────────────────────────────────────────┐
│  START: User invokes "Run sensei on {skill-name}"       │
└─────────────────────┬───────────────────────────────────┘
                      ▼
┌─────────────────────────────────────────────────────────┐
│  1. READ: Load skills/{skill-name}/SKILL.md             │
│           Load tests/{skill-name}/ (if exists)          │
│           Count tokens (baseline for comparison)        │
└─────────────────────┬───────────────────────────────────┘
                      ▼
┌─────────────────────────────────────────────────────────┐
│  2. SCORE: Run rule-based compliance check              │
│     • Check description length (> 150 chars?)           │
│     • Check for trigger phrases ("USE FOR:")            │
│     • Check for anti-triggers ("DO NOT USE FOR:")       │
│     • Check for compatibility field                     │
└─────────────────────┬───────────────────────────────────┘
                      ▼
              ┌───────────────┐
              │ Score >= M-H  │──YES──▶ COMPLETE ✓
              │ AND tests pass│        (next skill)
              └───────┬───────┘
                      │ NO
                      ▼
┌─────────────────────────────────────────────────────────┐
│  3. SCAFFOLD: If tests/{skill-name}/ missing:           │
│     Create tests from references/test-templates/        │
│     Creates prompts.md and framework-specific tests     │
└─────────────────────┬───────────────────────────────────┘
                      ▼
┌─────────────────────────────────────────────────────────┐
│  4. IMPROVE FRONTMATTER:                                │
│     • Add "USE FOR:" with trigger phrases               │
│     • Add "DO NOT USE FOR:" with anti-triggers          │
│     • Add compatibility if applicable                   │
│     • Keep description under 1024 chars                 │
│     • OR with --gepa: GEPA evolutionary optimization    │
└─────────────────────┬───────────────────────────────────┘
                      ▼
┌─────────────────────────────────────────────────────────┐
│  5. IMPROVE TESTS:                                      │
│     • Update shouldTriggerPrompts (5+ prompts)          │
│     • Update shouldNotTriggerPrompts (5+ prompts)       │
│     • Match prompts to new frontmatter triggers         │
└─────────────────────┬───────────────────────────────────┘
                      ▼
┌─────────────────────────────────────────────────────────┐
│  6. VERIFY: Run tests for the skill                     │
│     • If tests fail → fix and retry                     │
│     • If tests pass → continue                          │
└─────────────────────┬───────────────────────────────────┘
                      ▼
┌─────────────────────────────────────────────────────────┐
│  7. CHECK TOKENS:                                       │
│     npm run tokens count {skill}/SKILL.md               │
│     Verify under 500 token soft limit                   │
└─────────────────────┬───────────────────────────────────┘
                      ▼
┌─────────────────────────────────────────────────────────┐
│  8. SUMMARY: Display before/after comparison            │
│     • Score change (Low → Medium-High)                  │
│     • Token delta (+/- tokens)                          │
│     • Unimplemented suggestions                         │
└─────────────────────┬───────────────────────────────────┘
                      ▼
┌─────────────────────────────────────────────────────────┐
│  9. PROMPT USER: Choose action                          │
│     [C] Commit changes                                  │
│     [I] Create GitHub issue with suggestions            │
│     [S] Skip (discard changes)                          │
└─────────────────────┬───────────────────────────────────┘
                      ▼
              ┌───────────────┐
              │ Iteration < 5 │──YES──▶ Go to step 2
              └───────┬───────┘
                      │ NO
                      ▼
               TIMEOUT (move to next skill)

Batch Processing

When running on multiple skills:

Skills are processed sequentially
Each skill goes through the full loop
User prompted after each skill: Commit, Create Issue, or Skip
Summary report at the end shows all results

Configuration

| Setting | Default | Description | |---------|---------|-------------| | Skills directory | skills/ or .github/skills/ | Where SKILL.md files live | | Tests directory | tests/ | Where test files live | | Max iterations | 5 | Per-skill iteration limit before moving on | | Target score | Medium-High | Minimum compliance level | | Token soft limit | 500 | SKILL.md target token count | | Token hard limit | 5000 | SKILL.md maximum token count | | User prompt | After each skill | Commit, Create Issue, or Skip | | Continue on failure | Yes | Process remaining skills if one fails |

Custom Paths

Override defaults in your prompt:

Run sensei on my-skill with skills in src/ai/skills/ and tests in spec/

Scoring Criteria

Adherence Levels

| Level | Description | Criteria | |-------|-------------|----------| | Low | Basic description | No explicit triggers, often < 150 chars | | Medium | Has trigger keywords | Description > 150 chars, implicit or explicit trigger phrases, >60 words | | Medium-High | Has WHEN: or USE FOR: | "WHEN:" (preferred) or "USE FOR:" with ≤60 words | | High | Full compliance | Medium-High + routing clarity (INVOKES/FOR SINGLE OPERATIONS) |

Rule-Based Checks

Name validation
- Lowercase + hyphens only
- Matches directory name
- ≤ 64 characters
Description length
- Minimum: 150 characters (effective)
- Maximum: 1024 characters (spec limit)
Trigger phrases
- Contains "WHEN:", "USE FOR:", or "Use this skill when"
- Lists specific keywords and phrases
Anti-triggers (optional, context-dependent)
- "DO NOT USE FOR:" — useful for small skill sets (1-5 skills)
- ⚠️ Risky in multi-skill environments (10+ skills) — causes keyword contamination
Routing clarity (for High score)
- Skill type prefix: **WORKFLOW SKILL**, **UTILITY SKILL**, or **ANALYSIS SKILL**
- INVOKES: lists tools/MCP servers the skill calls
- FOR SINGLE OPERATIONS: guidance for when to bypass skill

Target: Medium-High

To reach Medium-High, a skill must have:

✅ Description > 150 characters
✅ Explicit trigger phrases ("WHEN:" preferred, or "USE FOR:")
✅ Description ≤ 60 words
✅ SKILL.md < 500 tokens (soft limit, monitored)

Target: High (with routing)

To reach High, add routing clarity:

✅ All Medium-High criteria
✅ Skill type prefix (**WORKFLOW SKILL**, etc.)
✅ INVOKES: listing tools/MCP servers used
✅ FOR SINGLE OPERATIONS: bypass guidance

MCP Integration Checks

When a skill's description contains INVOKES:, Sensei performs additional checks based on the MCP Integration Patterns:

| Check | Purpose | |-------|---------| | MCP Tools Used table | Documents tool dependencies in skill body | | Prerequisites section | Lists required tools and permissions | | CLI fallback pattern | Provides fallback when MCP unavailable | | Name collision detection | Warns when skill name matches MCP tool |

MCP Integration Score (0-4 points):

4/4 = Excellent MCP integration
3/4 = Good (minor gaps)
2/4 = Fair (needs improvement)
0-1/4 = Poor (missing key patterns)

See references/mcp-integration.md for detailed patterns.

Token Budget

SKILL.md: < 500 tokens (soft), < 5000 (hard)
references/*.md: < 2000 tokens each
Score skill: npm run tokens -- score [dir]
Check with: npm run tokens -- check
Get suggestions: npm run tokens -- suggest

Examples

Before: Low Adherence

---
name: pdf-processor
description: 'Process PDF files for various tasks'
---

Problems:

Only 37 characters
No trigger phrases
No anti-triggers
Agent doesn't know when to activate

After: Medium-High Adherence

---
name: pdf-processor
description: "Extract, rotate, merge, and split PDF files. WHEN: \"extract PDF text\", \"rotate PDF pages\", \"merge PDFs\", \"split PDF\", \"PDF to text\"."
---

Improvements:

~160 characters (informative but under limit)
Clear description of purpose
Explicit WHEN: trigger phrases with distinctive quoted strings

After: High Adherence (with routing)

---
name: azure-deploy
description: |
  **WORKFLOW SKILL** - Orchestrates deployment through preparation, validation,
  and execution phases for Azure applications.
  USE FOR: "deploy to Azure", "azd up", "push to Azure", "publish to Azure".
  DO NOT USE FOR: preparing new apps (use azure-prepare), validating before
  deploy (use azure-validate), Azure Functions specifically (use azure-functions).
  INVOKES: azure-azd MCP (up, deploy, provision), azure-deploy MCP (plan_get).
  FOR SINGLE OPERATIONS: Use azure-azd MCP directly for single azd commands.
---

High score achieved with:

Skill type prefix (**WORKFLOW SKILL**)
INVOKES: lists MCP tools used
FOR SINGLE OPERATIONS: guides when to bypass skill

Test Updates

Before (empty):

const shouldTriggerPrompts = [];
const shouldNotTriggerPrompts = [];

After:

const shouldTriggerPrompts = [
  'Extract text from this PDF',
  'Rotate this PDF 90 degrees',
  'Merge these PDF files together',
  'Split this PDF into pages',
  'Convert PDF to text',
];

const shouldNotTriggerPrompts = [
  'Create a new PDF document',
  'Extract images from this PDF',
  'OCR this scanned document',
  'What is the weather today?',
  'Help me with AWS S3',
];

Troubleshooting

Tests Failing After Improvement

Ensure shouldTriggerPrompts match "USE FOR:" phrases and shouldNotTriggerPrompts match "DO NOT USE FOR:" scenarios.

Skill Not Reaching Target Score

Common causes: description > 1024 chars, anti-triggers not using "DO NOT USE FOR:" format, or conflicting triggers with other skills.

Rolling Back Changes

git reset --soft HEAD~1  # Undo last commit

Contributing

Improving the Sensei Skill

Edit SKILL.md for instruction changes
Edit references/*.md for documentation changes
Test tokens: npm run tokens -- check
Test on a sample skill before committing

Adding New Scoring Rules

Document the rule in references/scoring.md
Add examples in references/examples.md
Update scoring criteria in SKILL.md

Adding Test Framework Support

Create template in references/test-templates/{framework}.md
Document usage in references/configuration.md

Waza Trigger Tests

Sensei supports Waza-style trigger accuracy testing. See the Waza test template.

Reporting Issues

Open an issue with skill name, starting state, and git log --oneline -10.

References

Ralph Loop Pattern - Sensei's iterative improvement workflow
Anthropic Skills Documentation - Writing guidance
MCP Integration Patterns - MCP integration best practices
Waza Trigger Test Template - Skill trigger accuracy testing
GEPA - Evolutionary optimization for skills
skill-creator - For creating new skills from scratch

Sensei - "The path to compliance begins with a single trigger." 🥋