@spboyer/sensei
v1.5.0
Published
AI-powered frontmatter compliance skill for GitHub Copilot
Maintainers
Readme
Sensei
"A true master teaches not by telling, but by refining." - The Skill Sensei
Sensei automates the improvement of Agent Skills frontmatter compliance using the Ralph loop pattern - iteratively improving skills until they reach Medium-High compliance with all tests passing.
Table of Contents
- Overview
- Quick Start
- Prerequisites
- How It Works
- Configuration
- Scoring Criteria
- Examples
- Troubleshooting
- Contributing
Overview
The Problem
Skills without proper frontmatter lead to skill collision - agents invoking the wrong skill for a given prompt. Common issues include:
- No triggers - Agent doesn't know when to activate the skill
- No anti-triggers - Agent doesn't know when NOT to use the skill
- Brief descriptions - Not enough context for accurate matching
- Token bloat - Oversized skills waste context window
The Solution
Sensei implements the "Ralph Wiggum" technique:
- Read - Load the skill's current state and token count
- Score - Evaluate frontmatter compliance
- Improve - Add triggers, anti-triggers, compatibility
- Verify - Run tests to ensure changes work
- Check Tokens - Analyze token usage, gather suggestions
- Summary - Display before/after with suggestions
- Prompt - Ask user: Commit, Create Issue, or Skip?
- Repeat - Until target score reached
Quick Start
Using with Copilot CLI
Single Skill
Run sensei on my-skill-nameSingle Skill (Fast Mode)
Run sensei on my-skill-name --fastMultiple Skills
Run sensei on skill-a, skill-b, skill-cAll Low-Adherence Skills
Run sensei on all Low-adherence skillsAll Skills
Run sensei on all skillsGEPA Mode (Deep Optimization)
Run sensei on my-skill-name --gepa
Run sensei score my-skill-nameScore-only mode runs without LLM calls.
External Integration Modes
CLI / npx
npx @spboyer/sensei score .
npx @spboyer/sensei check --root . --config .token-limits.json --strict--root resolves paths; --config selects limits. Global install: npm install --global @spboyer/sensei.
GitHub Action
steps:
- uses: actions/checkout@v4
- uses: spboyer/[email protected]
with:
command: check
root: .
path: .
config: .token-limits.json
strict: 'true'Library API
import { scoreSkillContent } from '@spboyer/sensei/score';
import { parseFrontmatter } from '@spboyer/sensei/parse';
import { checkNameCompliance } from '@spboyer/sensei/checks';
const result = scoreSkillContent(renderedSkillMarkdown, { path: 'skills/my-skill', moduleCount: 2 });
const frontmatter = parseFrontmatter(renderedSkillMarkdown);
checkNameCompliance(frontmatter?.name ?? '');path is metadata; moduleCount defaults to 0.
GEPA Commands
python scripts/src/gepa/auto_evaluator.py score --skill my-skill
python scripts/src/gepa/auto_evaluator.py optimize --skill my-skillFlags
| Flag | Description |
|------|-------------|
| --fast | Skip tests for faster iteration |
| --gepa | Use GEPA evolutionary optimization instead of template-based improvements |
| --skip-integration | Skip integration tests (unit + trigger tests only) |
⚠️ Note: Using
--fastspeeds up the loop significantly but may miss issues. Consider running full tests before final commit.
Prerequisites
Required
Node.js 18+ - For running token management scripts
node --versionGit - For commits and comparisons
git --version
Optional
Test Framework - Jest, pytest, or similar for trigger tests
Python 3.10+ and GEPA - For evolutionary optimization (
pip install gepa)
Installation
Option 1: Install as Copilot CLI Skill (Recommended)
mkdir -p "$HOME/.copilot/skills"
git clone https://github.com/spboyer/sensei.git "$HOME/.copilot/skills/sensei"
cd ~/.copilot/skills/sensei/scripts && npm installThe skill is now available in Copilot CLI. Invoke with:
Run sensei on my-skill-nameOption 2: Install in Project Skills Folder
For project-specific installation:
mkdir -p .github/skills
git clone https://github.com/spboyer/sensei.git .github/skills/sensei
cd .github/skills/sensei/scripts && npm installOption 3: Install the CLI from npm
For CLI/library use:
npm install --global @spboyer/sensei
sensei check .
npx @spboyer/sensei check .Verify Installation
cd ~/.copilot/skills/sensei && npm run tokens -- check
npx @spboyer/sensei check .How It Works
The Ralph Loop
┌─────────────────────────────────────────────────────────┐
│ START: User invokes "Run sensei on {skill-name}" │
└─────────────────────┬───────────────────────────────────┘
▼
┌─────────────────────────────────────────────────────────┐
│ 1. READ: Load skills/{skill-name}/SKILL.md │
│ Load tests/{skill-name}/ (if exists) │
│ Count tokens (baseline for comparison) │
└─────────────────────┬───────────────────────────────────┘
▼
┌─────────────────────────────────────────────────────────┐
│ 2. SCORE: Run rule-based compliance check │
│ • Check description length (> 150 chars?) │
│ • Check for trigger phrases ("USE FOR:") │
│ • Check for anti-triggers ("DO NOT USE FOR:") │
│ • Check for compatibility field │
└─────────────────────┬───────────────────────────────────┘
▼
┌───────────────┐
│ Score >= M-H │──YES──▶ COMPLETE ✓
│ AND tests pass│ (next skill)
└───────┬───────┘
│ NO
▼
┌─────────────────────────────────────────────────────────┐
│ 3. SCAFFOLD: If tests/{skill-name}/ missing: │
│ Create tests from references/test-templates/ │
│ Creates prompts.md and framework-specific tests │
└─────────────────────┬───────────────────────────────────┘
▼
┌─────────────────────────────────────────────────────────┐
│ 4. IMPROVE FRONTMATTER: │
│ • Add "USE FOR:" with trigger phrases │
│ • Add "DO NOT USE FOR:" with anti-triggers │
│ • Add compatibility if applicable │
│ • Keep description under 1024 chars │
│ • OR with --gepa: GEPA evolutionary optimization │
└─────────────────────┬───────────────────────────────────┘
▼
┌─────────────────────────────────────────────────────────┐
│ 5. IMPROVE TESTS: │
│ • Update shouldTriggerPrompts (5+ prompts) │
│ • Update shouldNotTriggerPrompts (5+ prompts) │
│ • Match prompts to new frontmatter triggers │
└─────────────────────┬───────────────────────────────────┘
▼
┌─────────────────────────────────────────────────────────┐
│ 6. VERIFY: Run tests for the skill │
│ • If tests fail → fix and retry │
│ • If tests pass → continue │
└─────────────────────┬───────────────────────────────────┘
▼
┌─────────────────────────────────────────────────────────┐
│ 7. CHECK TOKENS: │
│ npm run tokens count {skill}/SKILL.md │
│ Verify under 500 token soft limit │
└─────────────────────┬───────────────────────────────────┘
▼
┌─────────────────────────────────────────────────────────┐
│ 8. SUMMARY: Display before/after comparison │
│ • Score change (Low → Medium-High) │
│ • Token delta (+/- tokens) │
│ • Unimplemented suggestions │
└─────────────────────┬───────────────────────────────────┘
▼
┌─────────────────────────────────────────────────────────┐
│ 9. PROMPT USER: Choose action │
│ [C] Commit changes │
│ [I] Create GitHub issue with suggestions │
│ [S] Skip (discard changes) │
└─────────────────────┬───────────────────────────────────┘
▼
┌───────────────┐
│ Iteration < 5 │──YES──▶ Go to step 2
└───────┬───────┘
│ NO
▼
TIMEOUT (move to next skill)Batch Processing
When running on multiple skills:
- Skills are processed sequentially
- Each skill goes through the full loop
- User prompted after each skill: Commit, Create Issue, or Skip
- Summary report at the end shows all results
Configuration
| Setting | Default | Description |
|---------|---------|-------------|
| Skills directory | skills/ or .github/skills/ | Where SKILL.md files live |
| Tests directory | tests/ | Where test files live |
| Max iterations | 5 | Per-skill iteration limit before moving on |
| Target score | Medium-High | Minimum compliance level |
| Token soft limit | 500 | SKILL.md target token count |
| Token hard limit | 5000 | SKILL.md maximum token count |
| User prompt | After each skill | Commit, Create Issue, or Skip |
| Continue on failure | Yes | Process remaining skills if one fails |
Custom Paths
Override defaults in your prompt:
Run sensei on my-skill with skills in src/ai/skills/ and tests in spec/Scoring Criteria
Adherence Levels
| Level | Description | Criteria | |-------|-------------|----------| | Low | Basic description | No explicit triggers, often < 150 chars | | Medium | Has trigger keywords | Description > 150 chars, implicit or explicit trigger phrases, >60 words | | Medium-High | Has WHEN: or USE FOR: | "WHEN:" (preferred) or "USE FOR:" with ≤60 words | | High | Full compliance | Medium-High + routing clarity (INVOKES/FOR SINGLE OPERATIONS) |
Rule-Based Checks
Name validation
- Lowercase + hyphens only
- Matches directory name
- ≤ 64 characters
Description length
- Minimum: 150 characters (effective)
- Maximum: 1024 characters (spec limit)
Trigger phrases
- Contains "WHEN:", "USE FOR:", or "Use this skill when"
- Lists specific keywords and phrases
Anti-triggers (optional, context-dependent)
- "DO NOT USE FOR:" — useful for small skill sets (1-5 skills)
- ⚠️ Risky in multi-skill environments (10+ skills) — causes keyword contamination
Routing clarity (for High score)
- Skill type prefix:
**WORKFLOW SKILL**,**UTILITY SKILL**, or**ANALYSIS SKILL** INVOKES:lists tools/MCP servers the skill callsFOR SINGLE OPERATIONS:guidance for when to bypass skill
- Skill type prefix:
Target: Medium-High
To reach Medium-High, a skill must have:
- ✅ Description > 150 characters
- ✅ Explicit trigger phrases ("WHEN:" preferred, or "USE FOR:")
- ✅ Description ≤ 60 words
- ✅ SKILL.md < 500 tokens (soft limit, monitored)
Target: High (with routing)
To reach High, add routing clarity:
- ✅ All Medium-High criteria
- ✅ Skill type prefix (
**WORKFLOW SKILL**, etc.) - ✅
INVOKES:listing tools/MCP servers used - ✅
FOR SINGLE OPERATIONS:bypass guidance
MCP Integration Checks
When a skill's description contains INVOKES:, Sensei performs additional checks based on the MCP Integration Patterns:
| Check | Purpose | |-------|---------| | MCP Tools Used table | Documents tool dependencies in skill body | | Prerequisites section | Lists required tools and permissions | | CLI fallback pattern | Provides fallback when MCP unavailable | | Name collision detection | Warns when skill name matches MCP tool |
MCP Integration Score (0-4 points):
- 4/4 = Excellent MCP integration
- 3/4 = Good (minor gaps)
- 2/4 = Fair (needs improvement)
- 0-1/4 = Poor (missing key patterns)
See references/mcp-integration.md for detailed patterns.
Token Budget
- SKILL.md: < 500 tokens (soft), < 5000 (hard)
- references/*.md: < 2000 tokens each
- Score skill:
npm run tokens -- score [dir] - Check with:
npm run tokens -- check - Get suggestions:
npm run tokens -- suggest
Examples
Before: Low Adherence
---
name: pdf-processor
description: 'Process PDF files for various tasks'
---Problems:
- Only 37 characters
- No trigger phrases
- No anti-triggers
- Agent doesn't know when to activate
After: Medium-High Adherence
---
name: pdf-processor
description: "Extract, rotate, merge, and split PDF files. WHEN: \"extract PDF text\", \"rotate PDF pages\", \"merge PDFs\", \"split PDF\", \"PDF to text\"."
---Improvements:
- ~160 characters (informative but under limit)
- Clear description of purpose
- Explicit WHEN: trigger phrases with distinctive quoted strings
After: High Adherence (with routing)
---
name: azure-deploy
description: |
**WORKFLOW SKILL** - Orchestrates deployment through preparation, validation,
and execution phases for Azure applications.
USE FOR: "deploy to Azure", "azd up", "push to Azure", "publish to Azure".
DO NOT USE FOR: preparing new apps (use azure-prepare), validating before
deploy (use azure-validate), Azure Functions specifically (use azure-functions).
INVOKES: azure-azd MCP (up, deploy, provision), azure-deploy MCP (plan_get).
FOR SINGLE OPERATIONS: Use azure-azd MCP directly for single azd commands.
---High score achieved with:
- Skill type prefix (
**WORKFLOW SKILL**) INVOKES:lists MCP tools usedFOR SINGLE OPERATIONS:guides when to bypass skill
Test Updates
Before (empty):
const shouldTriggerPrompts = [];
const shouldNotTriggerPrompts = [];After:
const shouldTriggerPrompts = [
'Extract text from this PDF',
'Rotate this PDF 90 degrees',
'Merge these PDF files together',
'Split this PDF into pages',
'Convert PDF to text',
];
const shouldNotTriggerPrompts = [
'Create a new PDF document',
'Extract images from this PDF',
'OCR this scanned document',
'What is the weather today?',
'Help me with AWS S3',
];Troubleshooting
Tests Failing After Improvement
Ensure shouldTriggerPrompts match "USE FOR:" phrases and shouldNotTriggerPrompts match "DO NOT USE FOR:" scenarios.
Skill Not Reaching Target Score
Common causes: description > 1024 chars, anti-triggers not using "DO NOT USE FOR:" format, or conflicting triggers with other skills.
Rolling Back Changes
git reset --soft HEAD~1 # Undo last commitContributing
Improving the Sensei Skill
- Edit
SKILL.mdfor instruction changes - Edit
references/*.mdfor documentation changes - Test tokens:
npm run tokens -- check - Test on a sample skill before committing
Adding New Scoring Rules
- Document the rule in
references/scoring.md - Add examples in
references/examples.md - Update scoring criteria in SKILL.md
Adding Test Framework Support
- Create template in
references/test-templates/{framework}.md - Document usage in references/configuration.md
Waza Trigger Tests
Sensei supports Waza-style trigger accuracy testing. See the Waza test template.
Reporting Issues
Open an issue with skill name, starting state, and git log --oneline -10.
References
- Ralph Loop Pattern - Sensei's iterative improvement workflow
- Anthropic Skills Documentation - Writing guidance
- MCP Integration Patterns - MCP integration best practices
- Waza Trigger Test Template - Skill trigger accuracy testing
- GEPA - Evolutionary optimization for skills
- skill-creator - For creating new skills from scratch
Sensei - "The path to compliance begins with a single trigger." 🥋
