@stackforgeai/copilot-genkit
v1.0.0
Published
Production-grade AI orchestration framework for GitHub Copilot SDK — flows, prompts, tools, structured output, middleware, and observability, all routed through copilot-guard.
Maintainers
Readme
@stackforgeai/copilot-genkit
Production-grade AI orchestration framework for the GitHub Copilot SDK — flows, prompts, tools, structured output, middleware, and observability. All LLM calls are routed through @stackforgeai/copilot-guard for token budget enforcement.
Overview
copilot-genkit is a production-grade AI orchestration framework purpose-built for the GitHub Copilot SDK. It delivers:
- Flows — typed, observable async functions with automatic tracing
- Prompts — Handlebars-style templates with role markers and variable interpolation
- Tools — type-safe function definitions for model-driven tool calling
- Structured Output — schema-enforced JSON output with automatic validation and retry
- Middleware — composable request/response hooks (retry, cache, logging, custom)
- Registry — centralized action discovery for flows, prompts, and tools
- Observability — span-based tracing with latency percentiles and JSON export
All calls are guarded by copilot-guard — direct @github/copilot-sdk access is never used.
Features
| Feature | Description |
|---|---|
| defineFlow() | Create typed, traced async flows that can call generate() internally |
| definePrompt() | Handlebars-style templates with {{role "system"}}, {{#if}}, {{#each}} |
| defineTool() | Register tools for LLM function calling with type-safe handlers |
| generate() | Core generation with optional schema validation, tools, and middleware |
| Schema Validation | Lightweight schema enforcement with auto-repair on failure |
| Middleware | Composable retry(), cache(), logging(), and createMiddleware() |
| Registry | List, lookup, and discover all registered actions |
| Tracing | Automatic span creation, trace hierarchy, latency P50/P95/P99 |
| Guard Integration | All LLM calls routed through IGuard for token budget control |
Installation
npm install @stackforgeai/copilot-genkit @stackforgeai/copilot-guard @github/copilot-sdkOr from the monorepo:
cd copilot-genkit
npm install
npm run buildUsage Examples
Below are several hands-on examples that show common copilot-genkit workflows. Each example includes:
- Purpose: what the example demonstrates and when to use it.
- Who it's for: suggested audience level (Beginner / Intermediate / Advanced).
- Code: runnable snippet showing the API usage.
- Notes & tips: background knowledge, pitfalls, and next steps.
Quick Start
Purpose: A minimal end-to-end example showing how to create a CopilotGenkit instance and make a simple generation call. Use this to verify your environment and SDK authentication.
Who it's for: Beginner — get something working quickly.
import { CopilotGenkit } from '@stackforgeai/copilot-genkit';
const ai = new CopilotGenkit({
// Let copilot-guard select a safe default free model; avoid hardcoding premium models.
premiumLimit: 100,
});
const response = await ai.generate({
prompt: 'Explain event-driven architecture in 3 bullets.',
});
console.log(response.text);Notes & tips:
- Background:
CopilotGenkitroutes requests throughcopilot-guard, which enforces budget and validates model names. If you see model validation errors, callguard.loadAvailableModels()in advance. - Next steps: Try a prompt template or add
output.schema(see Structured Output) to enforce structured responses.
Flows
Purpose: Demonstrates how to create a reusable, typed async flow that calls generate() internally and is instrumented with tracing.
Who it's for: Intermediate — building composable app logic around LLM calls.
const summarizer = ai.defineFlow(
{ name: 'summarize' },
async (input: { text: string }, ctx) => {
const result = await ai.generate({
prompt: `Summarize: ${input.text}`,
});
return result.text;
},
);
const result = await summarizer.run({ text: 'Long document...' });
console.log(result.output); // Summarized text
console.log(result.traceId); // Trace ID for observability
console.log(result.latencyMs); // Execution timeNotes & tips:
- Background: Flows encapsulate business logic and make it easier to test and observe behavior. They are ideal for higher-level application features like summarization, extraction, or multi-step pipelines.
- Instrumentation: Traces include
traceIdand span hierarchy so you can correlate downstream tool calls and retries.
Prompt Templates
Purpose: Show how to author reusable role-based prompt templates with role markers and variable interpolation.
Who it's for: Beginner → Intermediate — useful for consistent system/user messaging patterns.
const greeting = ai.definePrompt(
{ name: 'greeting' },
`{{role "system"}}
You are a friendly assistant who speaks {{language}}.
{{role "user"}}
Hello, my name is {{name}}. Tell me about {{topic}}.`,
);
// Render to messages (no API call)
const messages = greeting.render({ language: 'English', name: 'Ada', topic: 'AI' });
// Generate a response (calls guard)
const response = await greeting.generate({ language: 'English', name: 'Ada', topic: 'AI' });Notes & tips:
- Background: Prompt templates help maintain consistent system instructions (temperament, persona) and reduce prompt-duplication across services.
- Best practice: Put safety-critical or high-level context in the system role. Keep user prompts concise and data-driven.
- Advanced: Use conditional blocks and loops to render lists or optional fields.
Structured Output
Purpose: Show how to request schema-enforced JSON output and let copilot-genkit validate and auto-repair model output when possible.
Who it's for: Intermediate → Advanced — useful when your application requires reliable, machine-readable responses.
const result = await ai.generate({
prompt: 'List 3 REST API endpoints for a task app.',
output: {
schema: {
isArray: true,
fields: [
{ name: 'method', type: 'string' },
{ name: 'path', type: 'string' },
{ name: 'description', type: 'string' },
],
},
format: 'json',
},
});
console.log(result.output); // Validated JSON arrayNotes & tips:
- Background: The schema system accepts a declarative description of expected fields.
CopilotGenkitinjects format instructions into the prompt and validates the returned JSON. - Failure handling: If parsing/validation fails, the library may attempt a single repair attempt by asking the model to reformat to the schema. For production, always validate on your side too.
Tools / Function Calling
Purpose: Demonstrates registering a tool (function) and enabling the model to request a tool call; the host executes the tool and returns results to the application.
Who it's for: Advanced — building tool-enabled agents or safe function-calling flows.
const calculator = ai.defineTool(
{
name: 'calculator',
description: 'Perform arithmetic',
inputSchema: {
fields: [
{ name: 'operation', type: 'string' },
{ name: 'a', type: 'number' },
{ name: 'b', type: 'number' },
],
},
},
(input) => {
if (input.operation === 'add') return input.a + input.b;
if (input.operation === 'multiply') return input.a * input.b;
return 0;
},
);
// Use in generation
const result = await ai.generate({
prompt: 'Calculate 15 * 23',
tools: [calculator.definition],
});
if (result.finishReason === 'tool_call') {
const { tool, input } = result.output;
const toolResult = await ai.executeToolCall(tool, input);
console.log('Result:', toolResult);
}Notes & tips:
- Background: Tools let you expose safe, deterministic functionality (calculators, DB lookups, internal APIs) to the model without giving it direct system access. The model can request a tool call by returning a structured tool-call object.
- Safety: Validate tool inputs before executing, and enforce timeouts and resource limits on tool execution.
Middleware
Purpose: Illustrate common middleware patterns (retry, caching, logging) and how to compose them for resilient LLM calls.
Who it's for: Intermediate — useful for production reliability and observability.
import { retry, cache, logging, createMiddleware } from '@stackforgeai/copilot-genkit';
const result = await ai.generate({
prompt: 'Resilient call',
use: [
logging(),
retry({ maxRetries: 3, initialDelayMs: 500 }),
cache({ ttlMs: 60_000 }),
],
});Notes & tips:
- Background: Middleware composes cross-cutting behaviors.
retry()helps with transient errors,cache()reduces cost for repeat calls, andlogging()captures request/response for debugging. - Cost awareness: Use caching and deterministic prompts to minimize repeated premium model calls.
Observability
Purpose: Show how to inspect traces and latency percentiles produced by CopilotGenkit instrumentation.
Who it's for: Intermediate → Advanced — monitoring and performance tuning.
// List recent traces
const traces = ai.listTraces(10);
for (const t of traces) {
console.log(`${t.traceId} | ${t.status} | spans: ${t.spans.length}`);
}
// Get latency percentiles
const perf = ai.getLatencyPercentiles('generate');
console.log(`P50: ${perf.p50}ms, P95: ${perf.p95}ms, P99: ${perf.p99}ms`);
// Export all traces as JSON
const data = ai.exportTraces();Notes & tips:
- Background: Tracing provides correlation between flows, prompts, tool calls, and middleware. Use
traceIdto examine a single end-to-end run in detail. - Production: Export traces to your observability backend and correlate with request metadata (user id, tenant, environment).
Architecture Overview
┌─────────────────────────────────────────────────────────────┐
│ CopilotGenkit │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌────────────┐ │
│ │ Flows │ │ Prompts │ │ Tools │ │ Registry │ │
│ └────┬─────┘ └────┬─────┘ └────┬─────┘ └────────────┘ │
│ │ │ │ │
│ └──────┬───────┘ │ │
│ ▼ │ │
│ ┌──────────────────────┐ │ │
│ │ generate() │◄─────────┘ │
│ │ + Schema Validator │ │
│ │ + Middleware Chain │ │
│ └──────────┬───────────┘ │
│ │ │
│ ┌──────────▼──────────┐ ┌───────────────────────────┐ │
│ │ Tracer │ │ Middleware │ │
│ │ (Spans, Traces, │ │ retry / cache / logging │ │
│ │ Percentiles) │ │ + custom │ │
│ └─────────────────────┘ └───────────────────────────┘ │
│ │
└──────────────────────┬──────────────────────────────────────┘
│
▼
┌─────────────────────┐
│ copilot-guard │
│ (IGuard interface) │
│ Token budget │
│ Model validation │
└──────────┬──────────┘
│
▼
┌─────────────────────┐
│ @github/copilot-sdk│
│ (peer dependency) │
└─────────────────────┘Key design principles:
- All LLM calls go through
copilot-guard— no direct SDK access - Dependency injection via
IGuardinterface for testing - Composable middleware chain for cross-cutting concerns
- Automatic span-based tracing for all operations
- Lightweight schema validation without external dependencies (no Zod)
API Reference
CopilotGenkit
| Method | Description |
|---|---|
| generate(req) | Send a generation request through the guard with middleware/schema support |
| defineFlow(config, fn) | Define a typed, traced async flow |
| definePrompt(config, template) | Define a reusable prompt template |
| defineTool(options, handler) | Define a tool for function calling |
| executeToolCall(name, input) | Execute a registered tool by name |
| renderTemplate(template, vars) | Render a template string (no API call) |
| listActions(type?) | List registered actions (flows, prompts, tools) |
| hasAction(type, name) | Check if an action is registered |
| getTrace(traceId) | Get a trace record by ID |
| listTraces(limit?) | List recent traces |
| getLatencyPercentiles(type, name) | Get P50/P95/P99 latency metrics |
| exportTraces() | Export all traces as JSON |
| getUsage() | Get current guard token usage stats |
Middleware Functions
| Function | Description |
|---|---|
| retry(opts) | Exponential backoff retry with configurable max retries |
| cache(opts) | In-memory response cache with TTL and max entries |
| logging(opts) | Request/response logging with custom logger |
| createMiddleware(name, fn) | Create a custom named middleware |
Utility Exports
| Export | Description |
|---|---|
| renderTemplate(template, vars) | Standalone template rendering |
| parseRoleTemplate(template, vars) | Parse role-based templates to messages |
| createTool(options, handler) | Create a tool action |
| formatToolsForPrompt(tools) | Format tool definitions for prompt injection |
| parseToolCall(text) | Parse tool call JSON from model response |
| SchemaValidator | Standalone schema validation class |
| ActionRegistry | Standalone action registry |
| Tracer | Standalone tracer |
Troubleshooting
Build Errors
| Error | Solution |
|---|---|
| Cannot find module '@stackforgeai/copilot-guard' | Build copilot-guard first: cd ../copilot-guard && npm run build |
| Cannot find module 'node:crypto' | Upgrade to Node.js >=20 |
| ERR_UNKNOWN_FILE_EXTENSION | Install tsx: npm install --save-dev tsx |
Runtime Errors
| Error | Solution |
|---|---|
| Guard blocked or request failed | Increase premiumLimit or check token usage with getUsage() |
| Schema validation failed | Check that the model output matches your schema definition |
| Tool not found | Ensure the tool is registered with defineTool() before calling executeToolCall() |
| Action already registered | Each action name must be unique within its type |
DISCLAIMER AND LIMITATION OF LIABILITY
THIS SOFTWARE IS PROVIDED STRICTLY ON AN "AS IS" AND "AS AVAILABLE" BASIS.
BY USING THIS SOFTWARE, YOU ACKNOWLEDGE AND AGREE THAT:
- THE SOFTWARE MAY CONTAIN BUGS, DEFECTS, DESIGN FLAWS, LOGIC ERRORS, SECURITY ISSUES, OR INCOMPLETE FEATURES.
- THE SOFTWARE MAY FAIL TO LIMIT OR PREVENT TOKEN USAGE, API REQUESTS, COST OVERRUNS, OR BILLING EVENTS.
- TOKEN ESTIMATION, RATE LIMITING, LOOP DETECTION, THROTTLING, AND SAFETY FEATURES MAY BE INACCURATE, INCOMPLETE, OR NON-FUNCTIONAL.
- SCHEMA VALIDATION, STRUCTURED OUTPUT ENFORCEMENT, AND TOOL CALL PARSING MAY PRODUCE INCORRECT OR INCOMPLETE RESULTS.
- FLOW EXECUTION, MIDDLEWARE CHAINS, AND OBSERVABILITY TRACING MAY NOT FUNCTION AS EXPECTED UNDER ALL CONDITIONS.
THE AUTHORS, CONTRIBUTORS, MAINTAINERS, COPYRIGHT HOLDERS, AFFILIATES, AND DISTRIBUTORS SHALL NOT BE LIABLE FOR ANY CLAIMS, DAMAGES, LOSSES, LIABILITIES, OR EXPENSES OF ANY KIND, INCLUDING BUT NOT LIMITED TO:
- API FEES, TOKEN CHARGES, CLOUD COMPUTE COSTS, OR OTHER FINANCIAL LOSSES
- DATA LOSS, CORRUPTION, OR SECURITY INCIDENTS
- INDIRECT, INCIDENTAL, SPECIAL, CONSEQUENTIAL, OR PUNITIVE DAMAGES
- LOSS OF PROFITS, REVENUE, GOODWILL, OR BUSINESS OPPORTUNITIES
USE OF THIS SOFTWARE IS ENTIRELY AT YOUR OWN RISK.
YOU ARE SOLELY RESPONSIBLE FOR:
- VERIFYING ALL OUTPUTS AND GENERATED CONTENT
- MONITORING API USAGE, TOKEN CONSUMPTION, AND BILLING
- IMPLEMENTING ADDITIONAL SAFEGUARDS AND VALIDATION
- TESTING MIDDLEWARE, FLOWS, AND TOOL INTEGRATIONS THOROUGHLY
THIS PROJECT SHOULD NOT BE USED AS THE SOLE OR PRIMARY MECHANISM FOR COST CONTROL, BILLING GOVERNANCE, SECURITY, OR PRODUCTION SAFETY.
ALWAYS IMPLEMENT INDEPENDENT PROVIDER-SIDE BILLING ALERTS, RATE LIMITS, BUDGET CONTROLS, AND MONITORING SYSTEMS.
License
MIT License. See package.json for details.
