@stackforgeai/copilot-context

v1.0.0

Published

12 days ago

Context management, hierarchical chunking, retrieval pipelines, schema-enforced structured output, and persistent session memory for GitHub Copilot SDK — all calls guarded by @stackforgeai/copilot-guard.

@stackforgeai/copilot-context

Context management, hierarchical chunking, retrieval pipelines, schema-enforced structured output, and persistent session memory for GitHub Copilot SDK — all LLM calls guarded by @stackforgeai/copilot-guard.

Overview

@stackforgeai/copilot-context is a production-grade module that solves the core challenges of working with large-context LLM workflows:

Context windows grow unbounded — this module manages context budgets with auto-compaction
RAG retrieval lacks structure — hierarchical chunking with parent-child expansion and multi-stage retrieval pipelines
LLM outputs are unpredictable — schema-enforced structured output with validation and retry
Conversation history explodes token costs — session memory with auto-compaction preserves continuity within budget
Latency is invisible — P50/P95/P99 latency tracking across all operations

All LLM calls are routed through @stackforgeai/copilot-guard for token budget enforcement. Direct @github/copilot-sdk access is never used.

Features

ContextManager

Bounded context window with configurable token budget
Priority-based entry retention during compaction
Auto-compaction when utilization exceeds threshold
LLM-powered summarization of older entries via the guard
High-priority entries (≥5) are protected from compaction
Real-time utilization tracking

ChunkingEngine

Hierarchical parent-child document chunking
Configurable chunk size and overlap
Parent chunks group children for expansion during retrieval
Flat (non-hierarchical) mode available
Metadata propagation to all chunks
Parent/child lookup helpers

RetrievalPipeline

4-stage pipeline: embed → retrieve → rerank → stitch
Term-based retrieval (no external vector DB required)
LLM-powered reranking via the guard for relevance scoring
Parent-chunk expansion (child match → full parent context)
Per-stage latency instrumentation
Configurable retrieval count and top-K

SchemaEnforcer

Schema-driven structured JSON output from LLMs
Type validation: string, number, boolean, array, object
Required/optional field support
Automatic retry with error feedback on validation failure
JSON extraction from markdown fences and preamble
Array and object schemas supported

SessionMemory

Token-budgeted conversation memory
Auto-compaction via LLM summarization
Snapshot/restore for session persistence
Configurable auto-compact toggle
Render as prompt-ready context string

ContextObserver

Per-operation latency recording
P50/P95/P99 percentile calculations
Aggregated summary across operations
JSON export for observability backends
time() helper for automatic latency measurement

Installation

npm install @stackforgeai/copilot-context @stackforgeai/copilot-guard @github/copilot-sdk

Requirements:

Node.js ≥ 20
@github/copilot-sdk must be installed as a peer dependency
@stackforgeai/copilot-guard is a direct dependency

Usage Examples

Context Management with Auto-Compaction

import { ContextManager } from "@stackforgeai/copilot-context";

const manager = new ContextManager({
  maxTokens: 4_000,
  compactionModel: "gpt-4o-mini",
  compactionThreshold: 0.8,  // Auto-compact at 80% utilization
  keepRecentCount: 2,        // Always keep the 2 most recent entries
});

// Add context entries with priority
await manager.add("System instructions for the assistant.", "system", 5); // High priority
await manager.add("User asked about API design.", "user");
await manager.add("Suggested REST endpoints.", "assistant");

// Check utilization
console.log(`${manager.getTotalTokens()} tokens used (${(manager.getUtilization() * 100).toFixed(0)}%)`);

// Render for inclusion in a prompt
const context = manager.render();

// Manual compaction (also happens automatically on add)
const result = await manager.compact();
console.log(`Compacted ${result.entriesCompacted} entries, saved ${result.tokensBefore - result.tokensAfter} tokens`);

Hierarchical Chunking + Retrieval

import { ChunkingEngine, RetrievalPipeline } from "@stackforgeai/copilot-context";

// Chunk a document
const engine = new ChunkingEngine({
  chunkSize: 256,       // tokens per chunk
  overlap: 32,          // overlap between chunks
  hierarchical: true,   // create parent-child groups
  childrenPerParent: 4,
});

const chunks = engine.chunk("doc-1", documentText);
console.log(`${engine.getParents(chunks).length} parents, ${engine.getChildren(chunks).length} children`);

// Build retrieval pipeline
const pipeline = new RetrievalPipeline({
  model: "gpt-4o-mini",
  retrievalCount: 20,
  rerankTopK: 5,
  expandToParent: true, // Expand child matches to parent for richer context
});

pipeline.index(chunks);

const result = await pipeline.retrieve({ query: "How does authentication work?", topK: 5 });
console.log(`Retrieved ${result.chunks.length} chunks in ${result.totalDurationMs}ms`);
console.log("Context:", result.context);

// Check per-stage latency
for (const stage of result.stages) {
  console.log(`${stage.stage}: ${stage.durationMs}ms`);
}

Schema-Enforced Structured Output

import { SchemaEnforcer } from "@stackforgeai/copilot-context";

const enforcer = new SchemaEnforcer({
  model: "gpt-4o-mini",
  maxRetries: 2,
});

const schema = {
  name: "APIEndpoint",
  fields: [
    { name: "method", type: "string", description: "HTTP method" },
    { name: "path", type: "string", description: "URL path" },
    { name: "description", type: "string", description: "What the endpoint does" },
  ],
  isArray: true,
};

const endpoints = await enforcer.enforce(
  "Design 5 REST API endpoints for a user management service.",
  schema,
);
// endpoints is guaranteed to be a validated array of objects
console.log(endpoints);

Session Memory with Persistence

import { SessionMemory } from "@stackforgeai/copilot-context";

const memory = new SessionMemory({
  maxTokens: 2_000,
  compactionModel: "gpt-4o-mini",
  sessionId: "project-alpha",
  autoCompact: true,
});

await memory.addTurn("user", "I need a REST API for user management.");
await memory.addTurn("assistant", "I'll design endpoints for CRUD operations.");
await memory.addTurn("user", "Add OAuth2 authentication.");

// Render for inclusion in a prompt
const context = memory.render();

// Save session state
const snapshot = memory.getSnapshot();
// Store snapshot to file/DB...

// Restore in a new instance
const restored = new SessionMemory({ maxTokens: 2_000, compactionModel: "gpt-4o-mini" });
restored.restore(snapshot);

Configuration

ContextManager

| Option | Type | Default | Description | |---|---|---|---| | maxTokens | number | — | Maximum token budget for the context window | | compactionModel | string | — | Model ID for LLM-powered compaction | | compactionThreshold | number | 0.8 | Utilization ratio (0–1) that triggers auto-compaction | | keepRecentCount | number | 2 | Minimum entries to keep uncompacted |

ChunkingEngine

| Option | Type | Default | Description | |---|---|---|---| | chunkSize | number | 512 | Target chunk size in tokens | | overlap | number | 64 | Overlap tokens between adjacent chunks | | hierarchical | boolean | true | Whether to create parent-child groups | | childrenPerParent | number | 4 | Number of child chunks per parent |

RetrievalPipeline

| Option | Type | Default | Description | |---|---|---|---| | model | string | — | Model for LLM-powered reranking | | retrievalCount | number | 20 | Candidates to retrieve before reranking | | rerankTopK | number | 5 | Top results after reranking | | expandToParent | boolean | true | Expand child matches to parent chunks |

SchemaEnforcer

| Option | Type | Default | Description | |---|---|---|---| | model | string | — | Model for structured output generation | | maxRetries | number | 2 | Retry attempts on validation failure | | timeout | number | 60000 | Guard call timeout in ms |

SessionMemory

| Option | Type | Default | Description | |---|---|---|---| | maxTokens | number | — | Maximum token budget for memory | | compactionModel | string | — | Model for compaction summarization | | sessionId | string | auto | Session identifier | | autoCompact | boolean | true | Auto-compact when budget exceeded |

Architecture Overview

┌────────────────────────────────────────────────────────┐
│                  @stackforgeai/copilot-context          │
│                                                        │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐ │
│  │ContextManager│  │ChunkingEngine│  │SchemaEnforcer│ │
│  │  (compaction) │  │ (parent/child)│  │  (validate)  │ │
│  └──────┬───────┘  └──────┬───────┘  └──────┬───────┘ │
│         │                 │                  │         │
│  ┌──────┴───────┐  ┌─────┴────────┐  ┌─────┴──────┐  │
│  │SessionMemory │  │  Retrieval   │  │  Context   │  │
│  │ (persistence)│  │  Pipeline    │  │  Observer  │  │
│  └──────┬───────┘  └──────┬───────┘  └────────────┘  │
│         │                 │                            │
│         └────────┬────────┘                            │
│                  ▼                                     │
│       ┌──────────────────┐                             │
│       │   IGuard (DI)    │                             │
│       └────────┬─────────┘                             │
└────────────────┼───────────────────────────────────────┘
                 ▼
       ┌──────────────────┐
       │ @stackforgeai/   │
       │ copilot-guard    │
       │ (token budget)   │
       └────────┬─────────┘
                ▼
       ┌──────────────────┐
       │ @github/         │
       │ copilot-sdk      │
       └──────────────────┘

Key architectural decisions:

All LLM calls flow through IGuard interface → CopilotGuard → copilot-sdk
No direct copilot-sdk imports in any source file except via the guard
Dependency injection via IGuard enables unit testing with mocks
ContextObserver is embedded in each component for latency tracking
Hierarchical chunking follows the parent-child expansion pattern from production RAG

Troubleshooting

"Could not find a declaration file for module '@stackforgeai/copilot-guard'"

Ensure @stackforgeai/copilot-guard is installed and has been built (npm run build in the guard package). The dist/ folder must contain .d.ts files.

"@github/copilot-sdk is not installed"

Install the peer dependency:

npm install @github/copilot-sdk

"All N attempts failed validation for schema"

The LLM consistently returned output that did not match the schema. Try:

Simplifying the schema (fewer fields, simpler types)
Using a more capable model (e.g., gpt-4.1 instead of gpt-4o-mini)
Increasing maxRetries
Adding more context to the task prompt

Auto-compaction not triggering

Check that compactionThreshold is set (default 0.8). Compaction only triggers when adding a new entry would push utilization above the threshold AND there are more entries than keepRecentCount.

Token estimation is approximate

Token counts use a chars / 4 heuristic. Actual token counts depend on the model's tokenizer. For precise budgeting, track the outputTokens from guard responses.

DISCLAIMER AND LIMITATION OF LIABILITY

IMPORTANT: THIS SOFTWARE IS PROVIDED STRICTLY ON AN "AS IS" AND "AS AVAILABLE" BASIS.

BY USING THIS SOFTWARE, YOU ACKNOWLEDGE AND AGREE THAT:

THE SOFTWARE MAY CONTAIN BUGS, DEFECTS, DESIGN FLAWS, LOGIC ERRORS, SECURITY ISSUES, OR INCOMPLETE FEATURES
THE SOFTWARE MAY FAIL TO LIMIT OR PREVENT TOKEN USAGE, API REQUESTS, COST OVERRUNS, OR BILLING EVENTS
TOKEN ESTIMATION, CONTEXT COMPACTION, CHUNKING, RETRIEVAL, SCHEMA VALIDATION, AND MEMORY MANAGEMENT FEATURES MAY BE INACCURATE, INCOMPLETE, OR NON-FUNCTIONAL
THE SOFTWARE MAY PRODUCE UNEXPECTED RESULTS
THE SOFTWARE MAY NOT BE SUITABLE FOR PRODUCTION ENVIRONMENTS
THE SOFTWARE MAY NOT PREVENT EXCESSIVE CHARGES FROM AI PROVIDERS OR CLOUD SERVICES

THIS SOFTWARE DOES NOT GUARANTEE:

COST SAVINGS
BILLING PROTECTION
TOKEN ACCURACY
FINANCIAL PROTECTION
RETRIEVAL ACCURACY
SCHEMA COMPLIANCE
CONTEXT PRESERVATION
SESSION CONTINUITY
SYSTEM STABILITY
SECURITY
RELIABILITY
FITNESS FOR ANY PARTICULAR PURPOSE

TO THE MAXIMUM EXTENT PERMITTED BY APPLICABLE LAW:

THE AUTHORS, CONTRIBUTORS, MAINTAINERS, COPYRIGHT HOLDERS, AFFILIATES, AND DISTRIBUTORS SHALL NOT BE LIABLE FOR ANY CLAIMS, DAMAGES, LOSSES, LIABILITIES, OR EXPENSES OF ANY KIND, INCLUDING BUT NOT LIMITED TO:

API FEES
TOKEN CHARGES
CLOUD COMPUTE COSTS
INFRASTRUCTURE COSTS
FINANCIAL LOSSES
LOST PROFITS
BUSINESS INTERRUPTION
SERVICE OUTAGES
DATA LOSS
DATA CORRUPTION
SECURITY INCIDENTS
INDIRECT DAMAGES
INCIDENTAL DAMAGES
CONSEQUENTIAL DAMAGES
SPECIAL DAMAGES
PUNITIVE DAMAGES
MISUSE OF THE SOFTWARE
FAILURE OF SAFETY FEATURES
FAILURE OF TOKEN LIMITS
FAILURE OF CONTEXT COMPACTION
FAILURE OF RETRIEVAL ACCURACY
FAILURE OF SCHEMA VALIDATION
FAILURE OF SESSION MEMORY
ERRORS IN TOKEN ESTIMATION
EXCESSIVE BILLING EVENTS
PRODUCTION FAILURES

USE OF THIS SOFTWARE IS ENTIRELY AT YOUR OWN RISK.

YOU ARE SOLELY RESPONSIBLE FOR:

VERIFYING ALL OUTPUTS
MONITORING API USAGE
MONITORING TOKEN CONSUMPTION
MONITORING BILLING
IMPLEMENTING ADDITIONAL SAFEGUARDS
TESTING IN YOUR OWN ENVIRONMENT
CONFIGURING APPROPRIATE LIMITS
VALIDATING ALL EXECUTION LOGIC
MAINTAINING BACKUPS AND RECOVERY PROCEDURES

THIS PROJECT SHOULD NOT BE USED AS THE SOLE OR PRIMARY MECHANISM FOR COST CONTROL, BILLING GOVERNANCE, SECURITY, OR PRODUCTION SAFETY.

ALWAYS IMPLEMENT INDEPENDENT PROVIDER-SIDE BILLING ALERTS, RATE LIMITS, BUDGET CONTROLS, AND MONITORING SYSTEMS.

IF YOU DO NOT AGREE WITH THESE TERMS, DO NOT USE THIS SOFTWARE.

License

MIT License

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND.

For full license text, see the LICENSE file.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@stackforgeai/copilot-context

Overview

Features

ContextManager

ChunkingEngine

RetrievalPipeline

SchemaEnforcer

SessionMemory

ContextObserver

Installation

Usage Examples

Context Management with Auto-Compaction

Hierarchical Chunking + Retrieval

Schema-Enforced Structured Output

Session Memory with Persistence

Configuration

ContextManager

ChunkingEngine

RetrievalPipeline

SchemaEnforcer

SessionMemory

Architecture Overview

Troubleshooting

"Could not find a declaration file for module '@stackforgeai/copilot-guard'"

"@github/copilot-sdk is not installed"

"All N attempts failed validation for schema"

Auto-compaction not triggering

Token estimation is approximate

DISCLAIMER AND LIMITATION OF LIABILITY

License