@stackforgeai/copilot-skills

v1.0.0

Published

a month ago

Composable skill engineering framework for GitHub Copilot SDK — define, register, compose, and execute reusable AI skills with typed I/O, context awareness, caching, and full observability, all routed through copilot-guard.

0High
0Medium
0Low

xerrex

copilot github-copilot copilot-sdk copilot-guard skills prompt-engineering skill-registry skill-composition ai-skills vibe-coding prompt-templates token-optimization observability caching

@stackforgeai/copilot-skills

Overview

@stackforgeai/copilot-skills provides a structured, production-ready layer for building reusable AI "skills" — typed prompt recipes with defined inputs, outputs, and execution policies — on top of the @github/copilot-sdk.

Inspired by the emerging trend of skill files and prompt engineering as first-class developer interfaces, this package treats each AI capability as a composable, testable, cacheable unit. Skills can be chained sequentially, run in parallel, gated by approval handlers, and observed with latency percentiles and token metrics.

All LLM calls are routed through @stackforgeai/copilot-guard. Direct SDK access is not permitted. copilot-guard controls token budgets, premium model governance, and execution authorization.

Features

Typed Skill Definitions — Structured descriptors with prompt templates, input schemas, model overrides, and execution policies
SkillRegistry — Central registry for discovering, filtering, and managing skills by name, tag, or search query
SkillRunner — Executes skills through copilot-guard with input validation, approval gates, and caching
SkillComposer — Chain skills sequentially with output routing and conditional step control, or run them in parallel
BuiltinSkills — Pre-built, production-ready skill templates: summarize, classify, extract, translate, review-code, generate-tests, draft-email, analyze-sentiment
SkillCache — TTL-based result cache keyed by (skillName, input) to avoid redundant API calls and reduce token spend
SkillObserver — Per-skill execution metrics with P50/P95/P99 latency percentiles and cache hit tracking
Approval Gates — Per-skill requiresApproval flag with pluggable async approval handlers
Template Engine — Lightweight {{variable}}, {{#if}}, and {{#each}} interpolation without external dependencies
Zero-dependency validation — Schema-based input validation with descriptive error messages
Full TypeScript support — Strict types for all APIs, inputs, outputs, and configuration
copilot-guard enforced — Token budgets, premium model blocking, and usage introspection are always active

Installation

npm install @stackforgeai/copilot-skills @stackforgeai/copilot-guard @github/copilot-sdk

Usage Examples

Basic: Register and run a custom skill

import { CopilotSkills } from '@stackforgeai/copilot-skills';

const skills = new CopilotSkills({
  premiumLimit: 100,   // max billing units for premium model calls (free models cost 0)
  defaultModel: 'gpt-4.1', // valid free Copilot model (billing.multiplier = 0)
  enableCaching: true,
});

skills.register({
  name: 'product-description',
  description: 'Generate e-commerce product copy.',
  systemPrompt: 'You are a professional copywriter. Write persuasively.',
  promptTemplate:
    'Write a product description for {{productName}}.\n' +
    '{{#if features}}Key features: {{features}}\n{{/if}}' +
    'Target audience: {{audience}}',
  outputFormat: 'text',
  cacheable: true,
  inputSchema: {
    fields: [
      { name: 'productName', type: 'string', required: true },
      { name: 'audience', type: 'string', required: true },
      { name: 'features', type: 'string', required: false },
    ],
  },
});

const result = await skills.run('product-description', {
  productName: 'UltraSound Pro Headphones',
  audience: 'audiophiles',
  features: '40-hour battery, ANC, lossless audio',
});

console.log(result.text);        // LLM-generated description
console.log(result.tokensUsed);  // Output tokens consumed
console.log(result.cached);      // false on first call
console.log(result.latencyMs);   // Wall-clock latency
console.log(result.traceId);     // Unique trace ID

Built-in Skill Templates

import { CopilotSkills, BuiltinSkills } from '@stackforgeai/copilot-skills';

const skills = new CopilotSkills({ premiumLimit: 100 });

skills.registerAll([
  BuiltinSkills.summarize(),
  BuiltinSkills.classify(),
  BuiltinSkills.extract(),
  BuiltinSkills.translate(),
  BuiltinSkills.reviewCode(),
  BuiltinSkills.generateTests(),
  BuiltinSkills.draftEmail(),
  BuiltinSkills.analyzeSentiment(),
]);

// Summarize
const summary = await skills.run('summarize', {
  text: 'Long article text...',
  maxBullets: 5,
});

// Translate
const translation = await skills.run('translate', {
  text: 'Hello, world!',
  targetLanguage: 'Japanese',
});

// Analyze sentiment
const sentiment = await skills.run('analyze-sentiment', {
  text: 'This product is absolutely fantastic!',
});
// sentiment.text → {"sentiment":"positive","confidence":0.97,"reasoning":"..."}

Override any built-in field when registering:

skills.register(
  BuiltinSkills.summarize({ model: 'gpt-4.1', cacheTTLMs: 3_600_000 }),
  true, // override
);

Sequential Composition

Chain skills where each step's output feeds the next:

const result = await skills.compose(
  [
    // Step 1: summarize
    { skill: 'summarize' },

    // Step 2: translate the summary — maps previous output to next input
    {
      skill: 'translate',
      inputMapper: (previousOutput, originalInput) => ({
        text: String(previousOutput),
        targetLanguage: originalInput.targetLanguage as string,
      }),
    },

    // Step 3: classify sentiment — skipped if text is too short
    {
      skill: 'analyze-sentiment',
      condition: (prev) => String(prev).length > 20,
      inputMapper: (prev) => ({ text: String(prev) }),
    },
  ],
  { text: 'Long document...', targetLanguage: 'French' },
);

console.log(result.finalOutput);       // output of last executed step
console.log(result.totalTokens);       // tokens across all steps
console.log(result.steps.length);      // number of steps that ran

Parallel Execution

Run multiple independent skills at the same time:

const result = await skills.parallel(
  ['summarize', 'classify', 'analyze-sentiment'],
  { text: 'Product review text...', categories: 'positive, neutral, negative' },
);

for (const step of result.steps) {
  console.log(step.skillName, '→', step.result.text);
}

Approval Gates

Require human (or automated) sign-off before a skill executes:

skills.register({
  name: 'delete-records',
  description: 'Generate a delete SQL statement',
  promptTemplate: 'Generate DELETE SQL for table {{table}} where {{condition}}',
  requiresApproval: true,
});

// Option 1: Global handler in config
const skills = new CopilotSkills({
  globalApprovalHandler: async (skillName, input) => {
    console.log(`Approve "${skillName}"?`, input);
    return true; // Replace with real user prompt or policy check
  },
});

// Option 2: Per-call handler
const result = await skills.run(
  'delete-records',
  { table: 'users', condition: 'inactive = true' },
  { approvalHandler: async () => confirmWithHuman() },
);

Observability

// Guard token usage
console.log(skills.getUsage());
// { premiumTokensUsed: 340, premiumLimit: 50000, remaining: 49660 }

// Per-skill metrics with latency percentiles
const metrics = skills.getMetrics('summarize');
console.log(metrics);
// {
//   skillName: 'summarize',
//   totalRuns: 12,
//   cacheHits: 7,
//   totalTokens: 250,
//   avgLatencyMs: 430,
//   p50LatencyMs: 400,
//   p95LatencyMs: 820,
//   p99LatencyMs: 1050,
// }

// All skills at once
const all = skills.getAllMetrics();

// Invalidate cache for a specific skill
skills.invalidateCache('summarize');

// Clear entire cache
skills.clearCache();

Registry Management

// Search by name, description, or tag
const results = skills.searchSkills('code');      // finds 'review-code', 'generate-tests'
const codingSkills = skills.listSkillsByTag('engineering');

// Check if registered
if (skills.hasSkill('summarize')) {
  console.log('Ready to summarize.');
}

// Remove a skill
skills.unregister('temp-skill');

// Fluent registration
skills
  .register(BuiltinSkills.summarize())
  .register(BuiltinSkills.translate())
  .register(myCustomSkill);

Configuration

const skills = new CopilotSkills({
  // Inject a custom guard (or mock for testing) — default: creates CopilotGuard
  guard?: IGuard;

  // Max cumulative billing units for the default CopilotGuard instance — default: 100
  // Free models (billing.multiplier=0) cost 0 per call; premium models cost their multiplier per call
  premiumLimit?: number;

  // Default model when a skill does not specify one — default: 'gpt-4.1' (free model, multiplier=0)
  // Must be a valid Copilot SDK model ID. Call guard.loadAvailableModels() to enumerate valid IDs.
  defaultModel?: string;

  // Set to false to disable result caching globally — default: true
  enableCaching?: boolean;

  // Called for all skills with requiresApproval: true when no per-call handler is provided
  globalApprovalHandler?: (skillName: string, input: SkillInput) => Promise<boolean>;
});

SkillDefinition fields

| Field | Type | Required | Description | |---|---|---|---| | name | string | ✅ | Unique registry key | | description | string | ✅ | Human-readable summary | | promptTemplate | string | ✅ | Template with {{var}}, {{#if}}, {{#each}} syntax | | version | string | — | Semantic version (e.g. "1.0.0") | | model | string | — | Override default model for this skill | | tags | string[] | — | Categorization tags for search and filtering | | systemPrompt | string | — | System role prompt; enables messages format | | outputFormat | 'text'\|'json'\|'markdown'\|'list' | — | Injects format instructions prefix | | inputSchema | SkillSchema | — | Field definitions for pre-execution validation | | requiresApproval | boolean | — | If true, approval handler must return true before execution | | cacheable | boolean | — | If true, identical inputs return cached output | | cacheTTLMs | number | — | Cache time-to-live in milliseconds (default: 300_000) |

Architecture Overview

┌─────────────────────────────────────────────────────┐
│                   CopilotSkills                      │
│  (Unified facade — registry + runner + composer)     │
└──────────┬───────────────────────┬───────────────────┘
           │                       │
    ┌──────▼──────┐        ┌───────▼───────┐
    │SkillRegistry│        │SkillComposer  │
    │  register() │        │  compose()    │
    │  search()   │        │  parallel()   │
    │  listByTag()│        └───────┬───────┘
    └─────────────┘                │
                                   │
                          ┌────────▼────────┐
                          │   SkillRunner   │
                          │  1. Validate    │
                          │  2. Approve     │
                          │  3. Cache check │
                          │  4. Render tmpl │
                          │  5. guard.send()│  ←── ALL LLM calls
                          │  6. Cache store │
                          │  7. Record obs  │
                          └────────┬────────┘
                                   │
                    ┌──────────────▼──────────────┐
                    │       copilot-guard          │
                    │  Token budget enforcement    │
                    │  Premium model blocking      │
                    │  Usage introspection         │
                    └──────────────┬──────────────┘
                                   │
                    ┌──────────────▼──────────────┐
                    │     @github/copilot-sdk      │
                    └─────────────────────────────┘

Supporting components:
  SkillCache    — TTL-based result cache (keyed by skillName + input)
  SkillObserver — Per-skill metrics with P50/P95/P99 latency percentiles
  BuiltinSkills — Pre-built skill definition factory (8 templates)
  renderTemplate — {{var}} / {{#if}} / {{#each}} template engine

Data flow for a single skill execution

CopilotSkills.run(name, input) → SkillRunner.run()
Input validated against inputSchema
Approval gate checked (if requiresApproval)
Cache lookup — returns immediately on hit
renderTemplate(promptTemplate, input) produces final prompt
copilot-guard.sendAndWait(model, prompt) — ONLY path to the AI model
Result stored in SkillCache (if cacheable)
Observation recorded in SkillObserver
SkillResult returned to caller

Troubleshooting

[SkillRunner] Skill '...' is not registered → Call skills.register(definition) or skills.registerAll([...]) before skills.run().

[SkillRunner] Input validation failed → Check that all required fields in inputSchema are present in the input object and have the correct types.

[SkillRunner] Skill '...' requires approval but no approvalHandler was provided → Either set globalApprovalHandler in the constructor config or pass approvalHandler in SkillRunOptions.

[CopilotGuard] Blocked: premium token limit reached → Increase premiumLimit in the constructor, or switch to a non-premium model. Check usage with skills.getUsage().

@github/copilot-sdk is not installed → Run npm install @github/copilot-sdk in your project.

[SkillRunner] Execution rejected by the approval handler → Your approvalHandler returned false. This is expected behavior when an approval gate denies execution.

Skills not found in search() or listByTag() → Check that tags are defined on the skill definition and match the search query exactly (case-insensitive substring match).

Cached result returned when fresh data is needed → Call skills.invalidateCache(skillName) to clear the cache for that skill, or pass { bypassCache: true } in SkillRunOptions.

DISCLAIMER AND LIMITATION OF LIABILITY

IMPORTANT: THIS SOFTWARE IS PROVIDED STRICTLY ON AN "AS IS" AND "AS AVAILABLE" BASIS.

BY USING THIS SOFTWARE, YOU ACKNOWLEDGE AND AGREE THAT:

THE SOFTWARE MAY CONTAIN BUGS, DEFECTS, DESIGN FLAWS, LOGIC ERRORS, SECURITY ISSUES, OR INCOMPLETE FEATURES
THE SOFTWARE MAY FAIL TO LIMIT OR PREVENT TOKEN USAGE, API REQUESTS, COST OVERRUNS, OR BILLING EVENTS
SKILL CACHING, INPUT VALIDATION, APPROVAL GATES, AND SAFETY FEATURES MAY BE INACCURATE, INCOMPLETE, OR NON-FUNCTIONAL
THE SOFTWARE MAY PRODUCE UNEXPECTED OR INCORRECT OUTPUTS FROM AI MODELS
THE SOFTWARE MAY NOT BE SUITABLE FOR PRODUCTION ENVIRONMENTS WITHOUT ADDITIONAL SAFEGUARDS
THE SOFTWARE MAY NOT PREVENT EXCESSIVE CHARGES FROM AI PROVIDERS OR CLOUD SERVICES
APPROVAL GATES MAY BE BYPASSED, FAIL, OR BEHAVE UNEXPECTEDLY

THIS SOFTWARE DOES NOT GUARANTEE:

COST SAVINGS
BILLING PROTECTION
TOKEN ACCURACY
FINANCIAL PROTECTION
REQUEST SAFETY
SYSTEM STABILITY
SECURITY
RELIABILITY
FITNESS FOR ANY PARTICULAR PURPOSE
CORRECTNESS OF AI MODEL OUTPUTS
EFFECTIVENESS OF APPROVAL GATES
CACHE ACCURACY OR FRESHNESS

TO THE MAXIMUM EXTENT PERMITTED BY APPLICABLE LAW:

THE AUTHORS, CONTRIBUTORS, MAINTAINERS, COPYRIGHT HOLDERS, AFFILIATES, AND DISTRIBUTORS SHALL NOT BE LIABLE FOR ANY CLAIMS, DAMAGES, LOSSES, LIABILITIES, OR EXPENSES OF ANY KIND, INCLUDING BUT NOT LIMITED TO:

API FEES
TOKEN CHARGES
CLOUD COMPUTE COSTS
INFRASTRUCTURE COSTS
FINANCIAL LOSSES
LOST PROFITS
BUSINESS INTERRUPTION
SERVICE OUTAGES
DATA LOSS
DATA CORRUPTION
SECURITY INCIDENTS
INDIRECT DAMAGES
INCIDENTAL DAMAGES
CONSEQUENTIAL DAMAGES
SPECIAL DAMAGES
PUNITIVE DAMAGES
MISUSE OF THE SOFTWARE
FAILURE OF SAFETY FEATURES
FAILURE OF APPROVAL GATES
FAILURE OF TOKEN LIMITS
STALE OR INCORRECT CACHED RESULTS
FAILED REQUEST BLOCKING
ERRORS IN COST ESTIMATION
EXCESSIVE BILLING EVENTS
PRODUCTION FAILURES

USE OF THIS SOFTWARE IS ENTIRELY AT YOUR OWN RISK.

YOU ARE SOLELY RESPONSIBLE FOR:

VERIFYING ALL AI MODEL OUTPUTS BEFORE USE
MONITORING API USAGE AND TOKEN CONSUMPTION
MONITORING BILLING AND IMPLEMENTING PROVIDER-SIDE BUDGET CONTROLS
IMPLEMENTING ADDITIONAL SAFEGUARDS APPROPRIATE TO YOUR ENVIRONMENT
TESTING IN YOUR OWN ENVIRONMENT BEFORE PRODUCTION DEPLOYMENT
CONFIGURING APPROPRIATE LIMITS AND APPROVAL POLICIES
VALIDATING ALL EXECUTION LOGIC AND SKILL BEHAVIOR
MAINTAINING BACKUPS AND RECOVERY PROCEDURES

THIS PROJECT SHOULD NOT BE USED AS THE SOLE OR PRIMARY MECHANISM FOR COST CONTROL, BILLING GOVERNANCE, SECURITY, COMPLIANCE, OR PRODUCTION SAFETY.

ALWAYS IMPLEMENT INDEPENDENT PROVIDER-SIDE BILLING ALERTS, RATE LIMITS, BUDGET CONTROLS, AND MONITORING SYSTEMS.

IF YOU DO NOT AGREE WITH THESE TERMS, DO NOT USE THIS SOFTWARE.

License

MIT License

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@stackforgeai/copilot-skills

Overview

Features

Installation

Usage Examples

Basic: Register and run a custom skill

Built-in Skill Templates

Sequential Composition

Parallel Execution

Approval Gates

Observability

Registry Management

Configuration

SkillDefinition fields

Architecture Overview

Data flow for a single skill execution

Troubleshooting

DISCLAIMER AND LIMITATION OF LIABILITY

License