thincontext

v1.0.3

Published

a month ago

Drop-in middleware to compress LLM context before it hits the API

0High
0Medium
0Low

omarpa

pi-package llm context-compression context-window agent tokens middleware anthropic openai claude

thincontext

Drop-in TypeScript middleware that compresses LLM context before it hits the API.

Every agent re-sends the same file reads, system prompts, and tool outputs on every turn. Thincontext sits in the middle and removes the redundancy — transparently, without changing your message format.

Agent → ContextCompressor.compress(messages) → LLM API

Node.js ≥ 18 · TypeScript · ESM + CJS

Install

npm install thincontext

Quick start

import { ContextCompressor } from 'thincontext'

const compressor = new ContextCompressor()

const { messages, stats } = await compressor.compress(myMessages)

console.log(`${stats.savedTokens} tokens saved (${((1 - stats.compressionRatio) * 100).toFixed(1)}%)`)

Zero configuration is needed for the default hash-based dedup behaviour. Add embed and summarize to unlock the full pipeline.

What it does

Five compression stages can run in sequence on every compress() call:

| Stage | What it does | Requires | |---|---|---| | Summarizer | Decays old conversation turns: verbatim → summary → dropped | summarize fn | | Deduplicator | Skips system/tool content the LLM already saw this session | nothing (hash) or embed fn (semantic) | | Chunker | Extracts only relevant lines from large code/document context | embed fn | | ReferenceCompressor | Replaces repeated large blocks with short [ref:...] tokens | nothing | | BudgetManager | Drops lowest-priority messages to fit a hard token budget | nothing |

Each module only activates when its dependencies are provided — the compressor degrades gracefully.

In practice

In a typical coding agent session where the same files are read across multiple turns:

first read of a large file: content is normalised and passed through
subsequent turns: the full file content can be replaced with a short reference or duplicate marker
older conversation turns (if summarize is configured): progressively compressed to short summaries, then dropped

In real Pi testing, thincontext produced meaningful savings on repeated tool-heavy turns, but not on every turn.

Important: savings are opportunistic, not guaranteed

Thincontext does not guarantee token savings on every turn.

A Pi footer like:

🗜 -0% chars

can be completely normal even when the extension is installed and working.

Helps most when

the agent reads the same files repeatedly across turns
the agent produces the same or very similar tool output multiple times
there are large tool results that exceed the truncation limit
repeated outputs are old enough to pass the dedup window

Helps less when

most output is new and unique
the session is dominated by fresh writes/edits
the repeated content is still too recent to deduplicate
tool outputs are already short
protected modification history must remain visible

Why you may see 0% savings

Some turns are mostly made of:

one-off bash output
fresh read results
recent edit / write operations
unique install logs or error logs

In those cases, thincontext may correctly decide that there is little or nothing safe to compress.

Options

new ContextCompressor({
  budget: 8000,
  embed: myEmbedFn,
  summarize: mySummarizeFn,
  countTokens: myTokenFn,

  dedup: {
    strategy: 'hash',
    threshold: 0.92,
    maxVectors: 5000,
  },

  summarization: {
    keepLastFull: 5,
    summarizeBeyond: 10,
  },

  chunking: {
    maxLines: 50,
    contextLines: 5,
    minLines: 100,
  },
})

Adapters

Adapters ship as separate entrypoints — zero impact on the core bundle if unused.

Embedding

import { openaiEmbed } from 'thincontext/embeddings/openai'
import { localEmbed } from 'thincontext/embeddings/local'

const compressor = new ContextCompressor({
  embed: openaiEmbed({ apiKey: process.env.OPENAI_API_KEY! }),
  // or: embed: await localEmbed()
})

Summarization

import { anthropicSummarize } from 'thincontext/summarize/anthropic'
import { openaiSummarize } from 'thincontext/summarize/openai'

const compressor = new ContextCompressor({
  summarize: anthropicSummarize({ apiKey: process.env.ANTHROPIC_API_KEY! }),
})

Message conversion

import { fromOpenAI } from 'thincontext/adapters/openai'
import { fromAnthropic } from 'thincontext/adapters/anthropic'

Message priorities

Tag messages to control how BudgetManager handles token pressure:

const messages = [
  { role: 'system', content: 'You are...', priority: 'critical' },
  { role: 'user', content: ragChunk, priority: 'low' },
  { role: 'assistant', content: lastReply, priority: 'high' },
]

Priorities: 'critical' · 'high' · 'normal' · 'low'

Session persistence

State (seen hashes, summary cache, ref table) lives in memory and survives across compress() calls.

const snapshot = compressor.export()
const compressor2 = ContextCompressor.restore(snapshot, { budget: 8000 })

Integrations

Pi agent

Install as a Pi package — the extension is bundled inside the npm package:

pi install npm:thincontext

Or add to your ~/.pi/agent/settings.json:

{
  "packages": ["npm:thincontext"]
}

The extension hooks Pi's context event to compress messages before every LLM call, with tool result deduplication and a live footer:

🗜 -72% chars

Commands inside Pi:

/thincontext on|off|reset|budget <n>|lines <n>|dedup-after <turns>|debug

Pi-specific notes

Current defaults are conservative:

maxToolLines = 300
dedupAfterTurns = 2
recent edit/write tool results are protected from budget dropping

Known limitations:

bash writes such as sed -i or echo > file are not reliably detected as modification records
truncation can hide important information that appears late in very long output
token estimates shown by the extension are approximate; Pi's own usage counters are more trustworthy
a given turn may show no savings even when the extension is working correctly

Claude Code

No context interception hook exists in Claude Code's interactive CLI — there is no equivalent to Pi's context event that fires before each LLM call.

The thincontext library still works for custom SDK/wrapper workflows, but a true drop-in Claude Code CLI plugin equivalent to the Pi extension is not currently possible with the available integration surface.

Token counting for Claude

cl100k_base is GPT-4's tokenizer. For Claude, expect some variance. See docs/token-counting.md for custom counter guidance.

What this is not

not an LLM proxy
not a RAG system
not model-specific
not a browser library

Publishing

The repo includes a GitLab pipeline that:

runs typecheck/tests on pushes
publishes to npm on version tags like v1.0.0

After publish, users can install with:

npm install thincontext

or in Pi:

pi install npm:thincontext

Development

npm ci
npm run typecheck
npm test
npm run build

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

thincontext

Install

Quick start

What it does

In practice

Important: savings are opportunistic, not guaranteed

Helps most when

Helps less when

Why you may see 0% savings

Options

Adapters

Embedding

Summarization

Message conversion

Message priorities

Session persistence

Integrations

Pi agent

Pi-specific notes

Claude Code

Token counting for Claude

What this is not

Publishing

Development

License