contextcrunch

v0.1.0

Published

18 days ago

Compress long-form logs and text into schema-valid JSON capsules for AI agents — cited evidence, massive token reduction, optional smart LLM refinement.

0High
0Medium
0Low

arneeshaima

logs compression llm agents tokens json observability context incident

contextcrunch

Compress long-form logs and text into schema-valid JSON capsules for AI agents — cited evidence, massive token reduction, and optional smart LLM refinement only when it helps.

ContextCrunch is a general-purpose Node.js library you embed anywhere (CI, agents, MCP servers, CLIs).

Features

Deterministic compression — template mining, level detection, signal scoring, PII redaction
Cited evidence — every item points to a real 1-based line number; roles like root_cause, trigger, consequence
Schema-valid ContextCapsule — strict JSON designed for tools and agents
Token budgets — target max capsule size; estimates included in metadata
Smart LLM (optional) — gate only fires for large, ambiguous incidents; relabels roles, never invents lines

Install

npm install contextcrunch

Quick start

import { crunch, crunchSync } from "contextcrunch";

const logs = await readFile("app.log", "utf8");

// Fast path — no network, no LLM
const capsule = crunchSync(logs, {
  service: "api",
  targetTokens: 4000,
});

console.log(capsule.compression); // e.g. 847
console.log(capsule.evidence);

import { crunch, createOpenAiCompleter } from "contextcrunch";

const capsule = await crunch(logs, {
  service: "api",
  llm: {
    complete: createOpenAiCompleter({
      apiKey: process.env.OPENAI_API_KEY!,
    }),
    // force: true  — override gate
  },
});

Capsule shape

{
  "schema_version": "1.0",
  "service": "api",
  "window": "14:22:11 to 14:22:21",
  "compression": 42,
  "evidence": [
    {
      "role": "root_cause",
      "line": 10,
      "text": "ERROR psycopg2.OperationalError",
      "level": "error"
    },
    {
      "role": "trigger",
      "line": 9,
      "text": "WARN pool acquire 480ms",
      "level": "warn"
    },
    {
      "role": "consequence",
      "line": 17,
      "text": "ERROR pool exhausted, queue=18",
      "level": "error"
    }
  ],
  "routine_summary": {
    "total_lines": 1200000,
    "template_count": 15,
    "routine_lines": 1199968,
    "top_templates": [
      {
        "id": "t1",
        "pattern": "INFO request id=<id> path=<path>",
        "count": 450000,
        "sample_line": 12
      }
    ]
  },
  "synopsis": "root_cause@L10: ... → trigger@L9: ...",
  "meta": {
    "original_lines": 1200000,
    "original_bytes": 108000000,
    "estimated_original_tokens": 27000000,
    "estimated_capsule_tokens": 3200,
    "processing_ms": 84,
    "llm_used": false
  }
}

API

| Export | Description | | --------------------------------------- | --------------------------------------------------- | | crunch(input, options?) | Async pipeline; optional LLM when gate approves | | crunchSync(input, options?) | Same compression without LLM | | shouldUseLlm(stats, llmOptions?) | Inspect the LLM gate decision | | createOpenAiCompleter(config) | OpenAI-compatible complete helper (no extra deps) | | estimateTokens / estimateJsonTokens | Rough token estimates (~4 chars/token) | | parseLines, toTemplate, redactPii | Lower-level building blocks |

Input types

CrunchInput = string | Buffer | readonly string[]

Options

| Option | Default | Description | | --------------------- | ------- | -------------------------------------------------------------------------------- | | service | — | Label on the capsule | | maxEvidence | 32 | Max cited lines | | targetTokens | 4000 | Shrink evidence until capsule fits | | redactPii | true | Emails, IPs, JWTs, API keys, UUIDs | | includeTopTemplates | 5 | Top routine patterns in summary | | llm | — | { complete, maxInputTokens?, maxOutputTokens?, force?, minCompressionForLlm? } |

LLM gate

LLM calls are off by default. The gate enables refinement when:

Input has ≥ 200 lines
Compression ratio already exceeds threshold (default 50×)
Multiple errors + many templates (ambiguous incident)
Prompt fits maxInputTokens (default 2000)

The model may only relabel roles for lines already in evidence and add a short hypothesis — never fabricate log lines.

Custom LLM provider

await crunch(logs, {
  llm: {
    complete: async ({ system, user, maxOutputTokens }) => {
      // Your Anthropic, Bedrock, local Ollama, etc.
      return "...";
    },
  },
});

Development

npm install
npm run build
npm test

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

contextcrunch

Features

Install

Quick start

Capsule shape

API

Input types

Options

LLM gate

Custom LLM provider

Development