@stackforgeai/copilot-context
v1.0.0
Published
Context management, hierarchical chunking, retrieval pipelines, schema-enforced structured output, and persistent session memory for GitHub Copilot SDK — all calls guarded by @stackforgeai/copilot-guard.
Maintainers
Readme
@stackforgeai/copilot-context
Context management, hierarchical chunking, retrieval pipelines, schema-enforced structured output, and persistent session memory for GitHub Copilot SDK — all LLM calls guarded by @stackforgeai/copilot-guard.
Overview
@stackforgeai/copilot-context is a production-grade module that solves the core challenges of working with large-context LLM workflows:
- Context windows grow unbounded — this module manages context budgets with auto-compaction
- RAG retrieval lacks structure — hierarchical chunking with parent-child expansion and multi-stage retrieval pipelines
- LLM outputs are unpredictable — schema-enforced structured output with validation and retry
- Conversation history explodes token costs — session memory with auto-compaction preserves continuity within budget
- Latency is invisible — P50/P95/P99 latency tracking across all operations
All LLM calls are routed through @stackforgeai/copilot-guard for token budget enforcement. Direct @github/copilot-sdk access is never used.
Features
ContextManager
- Bounded context window with configurable token budget
- Priority-based entry retention during compaction
- Auto-compaction when utilization exceeds threshold
- LLM-powered summarization of older entries via the guard
- High-priority entries (≥5) are protected from compaction
- Real-time utilization tracking
ChunkingEngine
- Hierarchical parent-child document chunking
- Configurable chunk size and overlap
- Parent chunks group children for expansion during retrieval
- Flat (non-hierarchical) mode available
- Metadata propagation to all chunks
- Parent/child lookup helpers
RetrievalPipeline
- 4-stage pipeline: embed → retrieve → rerank → stitch
- Term-based retrieval (no external vector DB required)
- LLM-powered reranking via the guard for relevance scoring
- Parent-chunk expansion (child match → full parent context)
- Per-stage latency instrumentation
- Configurable retrieval count and top-K
SchemaEnforcer
- Schema-driven structured JSON output from LLMs
- Type validation: string, number, boolean, array, object
- Required/optional field support
- Automatic retry with error feedback on validation failure
- JSON extraction from markdown fences and preamble
- Array and object schemas supported
SessionMemory
- Token-budgeted conversation memory
- Auto-compaction via LLM summarization
- Snapshot/restore for session persistence
- Configurable auto-compact toggle
- Render as prompt-ready context string
ContextObserver
- Per-operation latency recording
- P50/P95/P99 percentile calculations
- Aggregated summary across operations
- JSON export for observability backends
time()helper for automatic latency measurement
Installation
npm install @stackforgeai/copilot-context @stackforgeai/copilot-guard @github/copilot-sdkRequirements:
- Node.js ≥ 20
@github/copilot-sdkmust be installed as a peer dependency@stackforgeai/copilot-guardis a direct dependency
Usage Examples
Context Management with Auto-Compaction
import { ContextManager } from "@stackforgeai/copilot-context";
const manager = new ContextManager({
maxTokens: 4_000,
compactionModel: "gpt-4o-mini",
compactionThreshold: 0.8, // Auto-compact at 80% utilization
keepRecentCount: 2, // Always keep the 2 most recent entries
});
// Add context entries with priority
await manager.add("System instructions for the assistant.", "system", 5); // High priority
await manager.add("User asked about API design.", "user");
await manager.add("Suggested REST endpoints.", "assistant");
// Check utilization
console.log(`${manager.getTotalTokens()} tokens used (${(manager.getUtilization() * 100).toFixed(0)}%)`);
// Render for inclusion in a prompt
const context = manager.render();
// Manual compaction (also happens automatically on add)
const result = await manager.compact();
console.log(`Compacted ${result.entriesCompacted} entries, saved ${result.tokensBefore - result.tokensAfter} tokens`);Hierarchical Chunking + Retrieval
import { ChunkingEngine, RetrievalPipeline } from "@stackforgeai/copilot-context";
// Chunk a document
const engine = new ChunkingEngine({
chunkSize: 256, // tokens per chunk
overlap: 32, // overlap between chunks
hierarchical: true, // create parent-child groups
childrenPerParent: 4,
});
const chunks = engine.chunk("doc-1", documentText);
console.log(`${engine.getParents(chunks).length} parents, ${engine.getChildren(chunks).length} children`);
// Build retrieval pipeline
const pipeline = new RetrievalPipeline({
model: "gpt-4o-mini",
retrievalCount: 20,
rerankTopK: 5,
expandToParent: true, // Expand child matches to parent for richer context
});
pipeline.index(chunks);
const result = await pipeline.retrieve({ query: "How does authentication work?", topK: 5 });
console.log(`Retrieved ${result.chunks.length} chunks in ${result.totalDurationMs}ms`);
console.log("Context:", result.context);
// Check per-stage latency
for (const stage of result.stages) {
console.log(`${stage.stage}: ${stage.durationMs}ms`);
}Schema-Enforced Structured Output
import { SchemaEnforcer } from "@stackforgeai/copilot-context";
const enforcer = new SchemaEnforcer({
model: "gpt-4o-mini",
maxRetries: 2,
});
const schema = {
name: "APIEndpoint",
fields: [
{ name: "method", type: "string", description: "HTTP method" },
{ name: "path", type: "string", description: "URL path" },
{ name: "description", type: "string", description: "What the endpoint does" },
],
isArray: true,
};
const endpoints = await enforcer.enforce(
"Design 5 REST API endpoints for a user management service.",
schema,
);
// endpoints is guaranteed to be a validated array of objects
console.log(endpoints);Session Memory with Persistence
import { SessionMemory } from "@stackforgeai/copilot-context";
const memory = new SessionMemory({
maxTokens: 2_000,
compactionModel: "gpt-4o-mini",
sessionId: "project-alpha",
autoCompact: true,
});
await memory.addTurn("user", "I need a REST API for user management.");
await memory.addTurn("assistant", "I'll design endpoints for CRUD operations.");
await memory.addTurn("user", "Add OAuth2 authentication.");
// Render for inclusion in a prompt
const context = memory.render();
// Save session state
const snapshot = memory.getSnapshot();
// Store snapshot to file/DB...
// Restore in a new instance
const restored = new SessionMemory({ maxTokens: 2_000, compactionModel: "gpt-4o-mini" });
restored.restore(snapshot);Configuration
ContextManager
| Option | Type | Default | Description |
|---|---|---|---|
| maxTokens | number | — | Maximum token budget for the context window |
| compactionModel | string | — | Model ID for LLM-powered compaction |
| compactionThreshold | number | 0.8 | Utilization ratio (0–1) that triggers auto-compaction |
| keepRecentCount | number | 2 | Minimum entries to keep uncompacted |
ChunkingEngine
| Option | Type | Default | Description |
|---|---|---|---|
| chunkSize | number | 512 | Target chunk size in tokens |
| overlap | number | 64 | Overlap tokens between adjacent chunks |
| hierarchical | boolean | true | Whether to create parent-child groups |
| childrenPerParent | number | 4 | Number of child chunks per parent |
RetrievalPipeline
| Option | Type | Default | Description |
|---|---|---|---|
| model | string | — | Model for LLM-powered reranking |
| retrievalCount | number | 20 | Candidates to retrieve before reranking |
| rerankTopK | number | 5 | Top results after reranking |
| expandToParent | boolean | true | Expand child matches to parent chunks |
SchemaEnforcer
| Option | Type | Default | Description |
|---|---|---|---|
| model | string | — | Model for structured output generation |
| maxRetries | number | 2 | Retry attempts on validation failure |
| timeout | number | 60000 | Guard call timeout in ms |
SessionMemory
| Option | Type | Default | Description |
|---|---|---|---|
| maxTokens | number | — | Maximum token budget for memory |
| compactionModel | string | — | Model for compaction summarization |
| sessionId | string | auto | Session identifier |
| autoCompact | boolean | true | Auto-compact when budget exceeded |
Architecture Overview
┌────────────────────────────────────────────────────────┐
│ @stackforgeai/copilot-context │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ContextManager│ │ChunkingEngine│ │SchemaEnforcer│ │
│ │ (compaction) │ │ (parent/child)│ │ (validate) │ │
│ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │
│ │ │ │ │
│ ┌──────┴───────┐ ┌─────┴────────┐ ┌─────┴──────┐ │
│ │SessionMemory │ │ Retrieval │ │ Context │ │
│ │ (persistence)│ │ Pipeline │ │ Observer │ │
│ └──────┬───────┘ └──────┬───────┘ └────────────┘ │
│ │ │ │
│ └────────┬────────┘ │
│ ▼ │
│ ┌──────────────────┐ │
│ │ IGuard (DI) │ │
│ └────────┬─────────┘ │
└────────────────┼───────────────────────────────────────┘
▼
┌──────────────────┐
│ @stackforgeai/ │
│ copilot-guard │
│ (token budget) │
└────────┬─────────┘
▼
┌──────────────────┐
│ @github/ │
│ copilot-sdk │
└──────────────────┘Key architectural decisions:
- All LLM calls flow through
IGuardinterface →CopilotGuard→copilot-sdk - No direct
copilot-sdkimports in any source file except via the guard - Dependency injection via
IGuardenables unit testing with mocks ContextObserveris embedded in each component for latency tracking- Hierarchical chunking follows the parent-child expansion pattern from production RAG
Troubleshooting
"Could not find a declaration file for module '@stackforgeai/copilot-guard'"
Ensure @stackforgeai/copilot-guard is installed and has been built (npm run build in the guard package). The dist/ folder must contain .d.ts files.
"@github/copilot-sdk is not installed"
Install the peer dependency:
npm install @github/copilot-sdk"All N attempts failed validation for schema"
The LLM consistently returned output that did not match the schema. Try:
- Simplifying the schema (fewer fields, simpler types)
- Using a more capable model (e.g.,
gpt-4.1instead ofgpt-4o-mini) - Increasing
maxRetries - Adding more context to the task prompt
Auto-compaction not triggering
Check that compactionThreshold is set (default 0.8). Compaction only triggers when adding a new entry would push utilization above the threshold AND there are more entries than keepRecentCount.
Token estimation is approximate
Token counts use a chars / 4 heuristic. Actual token counts depend on the model's tokenizer. For precise budgeting, track the outputTokens from guard responses.
DISCLAIMER AND LIMITATION OF LIABILITY
IMPORTANT: THIS SOFTWARE IS PROVIDED STRICTLY ON AN "AS IS" AND "AS AVAILABLE" BASIS.
BY USING THIS SOFTWARE, YOU ACKNOWLEDGE AND AGREE THAT:
- THE SOFTWARE MAY CONTAIN BUGS, DEFECTS, DESIGN FLAWS, LOGIC ERRORS, SECURITY ISSUES, OR INCOMPLETE FEATURES
- THE SOFTWARE MAY FAIL TO LIMIT OR PREVENT TOKEN USAGE, API REQUESTS, COST OVERRUNS, OR BILLING EVENTS
- TOKEN ESTIMATION, CONTEXT COMPACTION, CHUNKING, RETRIEVAL, SCHEMA VALIDATION, AND MEMORY MANAGEMENT FEATURES MAY BE INACCURATE, INCOMPLETE, OR NON-FUNCTIONAL
- THE SOFTWARE MAY PRODUCE UNEXPECTED RESULTS
- THE SOFTWARE MAY NOT BE SUITABLE FOR PRODUCTION ENVIRONMENTS
- THE SOFTWARE MAY NOT PREVENT EXCESSIVE CHARGES FROM AI PROVIDERS OR CLOUD SERVICES
THIS SOFTWARE DOES NOT GUARANTEE:
- COST SAVINGS
- BILLING PROTECTION
- TOKEN ACCURACY
- FINANCIAL PROTECTION
- RETRIEVAL ACCURACY
- SCHEMA COMPLIANCE
- CONTEXT PRESERVATION
- SESSION CONTINUITY
- SYSTEM STABILITY
- SECURITY
- RELIABILITY
- FITNESS FOR ANY PARTICULAR PURPOSE
TO THE MAXIMUM EXTENT PERMITTED BY APPLICABLE LAW:
THE AUTHORS, CONTRIBUTORS, MAINTAINERS, COPYRIGHT HOLDERS, AFFILIATES, AND DISTRIBUTORS SHALL NOT BE LIABLE FOR ANY CLAIMS, DAMAGES, LOSSES, LIABILITIES, OR EXPENSES OF ANY KIND, INCLUDING BUT NOT LIMITED TO:
- API FEES
- TOKEN CHARGES
- CLOUD COMPUTE COSTS
- INFRASTRUCTURE COSTS
- FINANCIAL LOSSES
- LOST PROFITS
- BUSINESS INTERRUPTION
- SERVICE OUTAGES
- DATA LOSS
- DATA CORRUPTION
- SECURITY INCIDENTS
- INDIRECT DAMAGES
- INCIDENTAL DAMAGES
- CONSEQUENTIAL DAMAGES
- SPECIAL DAMAGES
- PUNITIVE DAMAGES
- MISUSE OF THE SOFTWARE
- FAILURE OF SAFETY FEATURES
- FAILURE OF TOKEN LIMITS
- FAILURE OF CONTEXT COMPACTION
- FAILURE OF RETRIEVAL ACCURACY
- FAILURE OF SCHEMA VALIDATION
- FAILURE OF SESSION MEMORY
- ERRORS IN TOKEN ESTIMATION
- EXCESSIVE BILLING EVENTS
- PRODUCTION FAILURES
USE OF THIS SOFTWARE IS ENTIRELY AT YOUR OWN RISK.
YOU ARE SOLELY RESPONSIBLE FOR:
- VERIFYING ALL OUTPUTS
- MONITORING API USAGE
- MONITORING TOKEN CONSUMPTION
- MONITORING BILLING
- IMPLEMENTING ADDITIONAL SAFEGUARDS
- TESTING IN YOUR OWN ENVIRONMENT
- CONFIGURING APPROPRIATE LIMITS
- VALIDATING ALL EXECUTION LOGIC
- MAINTAINING BACKUPS AND RECOVERY PROCEDURES
THIS PROJECT SHOULD NOT BE USED AS THE SOLE OR PRIMARY MECHANISM FOR COST CONTROL, BILLING GOVERNANCE, SECURITY, OR PRODUCTION SAFETY.
ALWAYS IMPLEMENT INDEPENDENT PROVIDER-SIDE BILLING ALERTS, RATE LIMITS, BUDGET CONTROLS, AND MONITORING SYSTEMS.
IF YOU DO NOT AGREE WITH THESE TERMS, DO NOT USE THIS SOFTWARE.
License
MIT License
Copyright (c) 2026 StackForgeAI
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND.
For full license text, see the LICENSE file.
