npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@cycgraph/context-engine

v0.1.0

Published

Framework-agnostic context optimization engine — composable compression pipeline that makes every token count.

Readme

@cycgraph/context-engine

A composable prompt-compression pipeline for TypeScript LLM stacks. Make every token count.

npm License: Apache 2.0 Node.js Standalone

📚 Documentation  ·  📖 Strategy


@cycgraph/context-engine is a composable compression pipeline for LLM prompts. Strip repeated facts, verbose serialisation, and stale reasoning traces from long memory payloads before they leave your code path — without losing what the model actually needs. Works standalone with any LLM framework (Vercel AI SDK, LangChain.js, the OpenAI SDK directly) or drops into @cycgraph/orchestrator via the contextCompressor hook.

Why this exists

You're paying input-token rates on prompts that contain:

  • The same fact repeated three times because the LLM phrased it slightly differently each round
  • Verbose JSON when a one-line tabular row would parse identically
  • Reasoning traces from earlier turns the LLM no longer needs to see
  • Filler stems and rephrased re-stems that contribute nothing

The compression engine catches each of these with a dedicated stage, runs them in order, and stays within a token budget you set.

How it works

  • Composable stages — mix and match: format compression, exact / fuzzy / semantic dedup, CoT distillation, heuristic pruning, self-information pruning, budget allocation. Use the bundled balanced / aggressive / conservative presets or build your own pipeline.
  • No LLM call required at the base tier — tier 0 is pure TypeScript with zod. Higher tiers add a token counter, an embedding provider, or a small local model for additional accuracy.
  • Model-aware format routingselectFormat() checks the target model's capability profile (supportsTabular, supportsNested, prefersJson) and picks a representation that fits. Custom profiles can be merged in.
  • Cache-aware prefix locking — stabilises the static prompt prefix so provider-side prompt caches (e.g. Anthropic prompt caching) get consistent cache hits across turns.
  • Streaming-friendly — an incremental pipeline (createIncrementalPipeline) supports turn-by-turn compression for long sessions without re-running the whole pipeline each turn.
  • Bring your own LLM stack — the package doesn't import any LLM SDK. Plug into Vercel AI SDK, LangChain.js, the OpenAI / Anthropic SDKs directly, or raw fetch.

Capability tiers

The pipeline runs at the tier you supply. Higher tiers add capabilities without changing the API.

| Tier | What you provide | Reduction range | |---|---|---| | 0 | Nothing — pure TypeScript with zod | 15–45% | | 1 | A token counter (tiktoken or a custom adapter) | +5–10% (exact budgeting) | | 2 | An embedding provider | +10–20% (semantic dedup) | | 3 | A small local model (GPT-2 / Phi-2) | +30–50% (perplexity-based pruning) |

A tiktoken adapter ships with the package. Embedding and local-model adapters are user-provided through three small interfaces.

Install

npm install @cycgraph/context-engine

Zero runtime dependencies except zod.

Quick taste

The simplest entry point — pick a preset, compress segments to fit a budget:

import { createOptimizedPipeline } from '@cycgraph/context-engine';

const { pipeline } = createOptimizedPipeline({ preset: 'balanced' });

const result = pipeline.compress({
  segments: [
    { id: 'system', content: 'You are a research assistant.', role: 'system', priority: 1 },
    { id: 'memory', content: JSON.stringify(largeMemoryObject), role: 'memory', priority: 1 },
    { id: 'user', content: 'Summarise the findings.', role: 'user', priority: 1 },
  ],
  budget: { maxTokens: 8_192, outputReserve: 1_024 },
});

console.log(result.metrics);
// {
//   totalTokensIn: 12450,
//   totalTokensOut: 4870,
//   reductionPercent: 60.9,
//   stages: [ { name: 'format', tokensIn: 12450, tokensOut: 8200, durationMs: 2 }, ... ],
// }

// Send `result.segments` to your LLM however you normally would.

Presets: aggressive, balanced, conservative. Or build your own pipeline stage by stage.

Pipeline architecture

Input segments (system, memory, tools, history, user)
  ↓  Cache-Aware Prefix Locking      ← stabilises prompt prefix for provider caching
  ↓  Hierarchy / Graph Formatting    ← memory payloads → compact representation
  ↓  Model-Aware Format Selection    ← per-target-model optimization (Claude vs GPT vs Haiku)
  ↓  Format Compression              ← JSON → tabular / flat object / nested compact
  ↓  Exact Deduplication             ← hash-based
  ↓  Fuzzy Deduplication             ← trigram similarity
  ↓  Semantic Deduplication          ← embedding-based (tier 2+)
  ↓  CoT Distillation                ← reasoning-trace eviction
  ↓  Self-Information Pruning         ← perplexity-based (tier 3+)
  ↓  Heuristic Pruning               ← rule-based
  ↓  Budget Allocation               ← priority-weighted, within token cap
Output segments (compressed, within budget)

Each stage is independent and composable. Use the full pipeline, a single stage, or your own ordering.

Custom pipelines

When the presets don't fit, build the pipeline directly:

import {
  createPipeline,
  createFormatStage,
  createExactDedupStage,
  createFuzzyDedupStage,
  createAllocatorStage,
} from '@cycgraph/context-engine';

const pipeline = createPipeline({
  stages: [
    createFormatStage({ strategy: 'auto' }),      // pick best format per shape
    createExactDedupStage(),                       // hash-based exact match
    createFuzzyDedupStage({ similarity: 0.85 }),   // trigram near-match
    createAllocatorStage({ strategy: 'priority' }),// fit within budget
  ],
});

const result = pipeline.compress({ segments, budget });

Use cases

  • You're paying for redundant content in your input tokens. Drop the engine in front of your existing prompt build step. The actual reduction depends on data shape and which tiers you wire up — see the Capability tiers table.
  • You want to extend usable context-window real estate. Same model, same budget — fit more relevant content.
  • You're building an agent framework or RAG system. The format-compression stage alone often pays for itself on serialised knowledge graphs.
  • You want provider-side prompt caching to hit consistently. The cache-aware prefix locking stage stabilises the static prefix so caches stop churning on small turn-over-turn differences.

Memory-payload formatting

Memory payloads (facts, entities, themes from a knowledge graph) often dominate token cost. Dedicated formatters compress them into compact representations:

| Input shape | Formatter | Output style | |---|---|---| | Hierarchical memory (xMemory) | formatHierarchy | Theme: X / Facts: ... / Entities: ... indented block | | Knowledge graph (entities + edges) | serializeGraph | Markdown adjacency table | | Community summaries (GraphRAG) | formatCommunities | Theme rollups with delta-encoded membership |

A selectFormat() helper picks among these based on the target model's capability profile (supportsTabular, supportsNested, prefersJson). Built-in profiles cover common model families; custom profiles can be merged in.

Standalone or as cycgraph's compression layer

Standalone — Build prompts in your own framework, pass segments through the pipeline, render the output however you like. The pipeline doesn't know or care about your LLM client.

With @cycgraph/orchestrator — Pass the pipeline as a contextCompressor to GraphRunnerOptions. The orchestrator calls it before injecting memory into agent and supervisor prompts. See Context Compression in the docs.

import { GraphRunner } from '@cycgraph/orchestrator';
import { createOptimizedPipeline, serialize } from '@cycgraph/context-engine';

const { pipeline } = createOptimizedPipeline({ preset: 'balanced' });

const contextCompressor = (sanitizedMemory, options) => {
  const result = pipeline.compress({
    segments: [{ id: 'memory', content: serialize(sanitizedMemory), role: 'memory', priority: 1 }],
    budget: { maxTokens: options?.maxTokens ?? 8192, outputReserve: 0 },
  });
  return { compressed: result.segments[0].content, metrics: result.metrics };
};

const runner = new GraphRunner(graph, state, { contextCompressor });

Observability

Every compression call returns metrics: per-stage tokensIn / tokensOut / durationMs, total reduction percent, format selection decisions, cache stability diagnostics. Wire to Prometheus or your tracing of choice.

A LatencyTracker + CircuitBreaker pair lets you skip slow stages under load — graceful degradation when a downstream embedding service is flaky.

Documentation

Contributing

Issues and PRs welcome on GitHub. See CONTRIBUTING.md.

License

Apache 2.0.