semdiff

v0.1.2

Published

16 days ago

Meaning-aware diff engine: surfaces substantive prose changes and suppresses cosmetic ones. Library + CLI.

0High
0Medium
0Low

benzbri

diff semantic-diff meaning-aware prose text-diff structured-diff llm classifier anthropic claude

semdiff

Meaning-aware diff engine and CLI that surfaces substantive changes in prose, not cosmetic edits.

semdiff answers a question that a line-based diff cannot: did the meaning change? It ignores reflowed whitespace, renumbered clauses, punctuation normalization, and synonym swaps that carry no new obligation, and it flags the edits that actually alter substance — a tightened threshold, a new exemption, a shifted deadline, an added requirement.

It is a standalone, domain-neutral library and CLI. It has no backend and no network dependencies of its own beyond the LLM provider you configure. The name is deliberately generic: semdiff is useful to anyone diffing prose where meaning matters more than characters — contracts, policies, terms of service, documentation, or regulations.

[!NOTE] semdiff originated as the engine behind a sustainability-regulation change tracker, but it is built and packaged to stand on its own. See adr/0001 for the scope boundary between this engine and any application that consumes it.

Why not just `diff`?

A character- or line-based diff is precise but semantically blind. Given two revisions of a paragraph, it reports that bytes changed, not whether the obligation changed. In a legal or policy setting that produces two failure modes that are both expensive:

Noise. Cosmetic edits (formatting, renumbering, citation-style changes) light up as diffs and bury the one change that matters.
Missed substance. A reworded sentence that quietly narrows an exemption looks like a small token-level edit and gets dismissed.

semdiff classifies each aligned change as substantive or cosmetic, and for substantive changes describes what changed, with a confidence signal and a pointer back to the exact spans involved.

What it is not

It does not interpret or give legal advice. It reports what changed between two texts; it does not tell you what the change means for you.
It is not a generic web scraper or an ingestion pipeline. It diffs two inputs you hand it.
It is not nondeterministic by accident. The quality and determinism layer (caching, schema validation, confidence flags, an eval harness) is the point — see adr/0005.

Status

Implemented (v0, pre-1.0). The pipeline — segment → align → classify → structured diff — works end to end behind a per-file coverage gate (95% line / 90% branch). The default classifier calls the Anthropic API (set ANTHROPIC_API_KEY), or you can inject your own Classifier, optionally wrapped with withCache so identical changes are classified once (ADR-0004). The eval harness (adr/0005) scores classifier accuracy (npm run eval); curated result snapshots are in eval/RESULTS.md. Architecture decisions live in adr/; the working agreement for contributors (human and AI) is in CLAUDE.md.

Install

npm install semdiff

The published package ships compiled ESM with bundled type declarations and has zero runtime dependencies beyond the LLM provider you configure (ADR-0009). It runs on Node ≥ 20, both locally (CLI) and on AWS Lambda (library).

Usage

As a library:

import { diff } from "semdiff";

// ANTHROPIC_API_KEY in the environment, or inject your own Classifier.
const result = await diff(before, after);
for (const change of result.changes) {
  console.log(change.type, change.classification, change.description ?? "");
}

As a CLI (installed globally, or via npx):

npx semdiff before.txt after.txt                      # structured diff as JSON
npx semdiff before.txt after.txt --granularity clause

From a checkout of this repo you can run the source directly without building — node src/cli.ts before.txt after.txt — on a Node that strips TypeScript types.

Changed content — insertions, deletions, and modifications — is classified by the model; identical, cosmetic, and relocated (moved) content is classified locally and needs no API key.

Configuration

The only thing semdiff needs to configure is the LLM provider. The common case is zero code: set your key in the environment and the defaults handle the rest.

1. Your API key (the only required setup)

export ANTHROPIC_API_KEY=sk-ant-...      # macOS/Linux
$env:ANTHROPIC_API_KEY = "sk-ant-..."    # PowerShell

Both the library and the CLI read ANTHROPIC_API_KEY automatically — no other setup is needed. The key is only used when a change actually has to reach the model; identical, cosmetic, and moved content never needs it.

2. Override the model or pass the key explicitly

The default model is claude-opus-4-8 (the latest capable Claude). Override it — or supply the key in code instead of the environment — per call:

import { diff } from "semdiff";

const result = await diff(before, after, {
  modelId: "claude-sonnet-4-6",          // any Anthropic model id; default: claude-opus-4-8
});

To pass the key in code (e.g. from your own secret store rather than the environment), construct the default classifier explicitly and inject it:

import { diff, createDefaultClassifier } from "semdiff";

const classifier = createDefaultClassifier({
  apiKey: mySecret,                       // default: process.env.ANTHROPIC_API_KEY
  modelId: "claude-opus-4-8",             // optional
  timeoutMs: 60000,                       // optional; per-call timeout (ADR-0012)
  maxRetries: 2,                          // optional; retries on 429/5xx/network/timeout (ADR-0012)
});
const result = await diff(before, after, { classifier });

The modelId you pass is also stamped into the result's provenance, so a diff always records which model produced it (ADR-0004).

Each model call has a timeout and retries transient failures (429, 5xx, network errors, timeouts) with exponential backoff, honouring Retry-After; non-transient errors (400, auth) fail fast. Tune with timeoutMs / maxRetries, or set maxRetries: 0 for a single attempt (ADR-0012).

3. Use a different provider entirely

semdiff depends on a small Classifier interface, not on Anthropic. To use another provider (OpenAI, a local model, a mock for tests), implement classify and inject it — the engine keeps its zero-dependency runtime and never constructs the default classifier:

import { diff, type Classifier } from "semdiff";

const classifier: Classifier = {
  async classify(pair) {
    // pair: { type, a, b, spanA, spanB }  →  call your provider here
    return { classification: "substantive", confidence: 0.9, description: "…" };
  },
};
const result = await diff(before, after, { classifier });

Wrap any classifier with withCache so identical changes are classified once (ADR-0004):

import { diff, createDefaultClassifier, withCache } from "semdiff";

const classifier = withCache(createDefaultClassifier({}), {
  modelId: "claude-opus-4-8",
  promptVersion: "0",
});

Design at a glance

input A ─┐
         ├─▶ segment ─▶ align ─▶ classify ─▶ structured diff
input B ─┘             (cheap,    (LLM, gated   (substantive vs
                        local)     on change)    cosmetic + spans)

Segment both texts into comparable units (clauses / sentences).
Align units across the two versions with a cheap, deterministic local pass (no LLM): exact and near-exact matches are settled here.
Classify the genuinely changed units with the LLM — modifications, insertions, and deletions alike (ADR-0011) — returning a structured, schema-validated verdict. Unchanged, trivially-changed, and relocated (moved) units never reach the model, which bounds cost and nondeterminism.
Emit a stable, versioned structured diff (JSON); the CLI prints that JSON, and any human-readable rendering is a pure function of it (ADR-0006).

The full reasoning is in the ADRs:

| ADR | Decision | | --- | --- | | 0001 | Standalone, domain-neutral engine separate from any application | | 0002 | TypeScript / Node, distributed as both a library and a CLI | | 0003 | Segment → align → classify pipeline | | 0004 | LLM-backed classification, gated and structured | | 0005 | The eval + determinism layer is the core contribution | | 0006 | Stable structured diff schema as the public contract | | 0007 | Spans are half-open character offsets into the literal input | | 0008 | Vitest with a per-file coverage gate | | 0009 | The default classifier calls the Anthropic API over fetch | | 0010 | Move detection by content match (deterministic, cosmetic) | | 0011 | Classify one-sided changes (insertions/deletions) through the model | | 0012 | Default classifier resilience: per-call timeout and bounded retry with backoff |

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

semdiff

Why not just diff?

What it is not

Status

Install

Usage

Configuration

1. Your API key (the only required setup)

2. Override the model or pass the key explicitly

3. Use a different provider entirely

Design at a glance

License

Why not just `diff`?