@mcarvin/smart-diff

v2.1.0

Published

7 days ago

Summarizes a git diff using any LLM provider supported by the Vercel AI SDK (OpenAI, Anthropic, Google, Bedrock, Mistral, Cohere, Groq, xAI, DeepSeek, or any OpenAI-compatible gateway).

0High
0Medium
0Low

mcarvin

ai ai-sdk anthropic bedrock cohere deepseek diff gemini git google groq llm mistral openai openai-compatible vercel-ai-sdk xai

smart-diff

TypeScript library that turns a git revision range into a Markdown summary using any LLM provider supported by the Vercel AI SDK — OpenAI, Anthropic, Google Gemini, Amazon Bedrock, Mistral, Cohere, Groq, xAI, DeepSeek, or any OpenAI-compatible gateway. It uses simple-git to read the repo, respects path includes/excludes and commit message include/exclude regexes, and sends commits, paths, structured diff stats, and unified diff text to the model.

Requirements

Node.js 20+
An LLM provider credential (see Provider configuration)
Git on the PATH

Installation

npm install @mcarvin/smart-diff

@ai-sdk/openai and @ai-sdk/openai-compatible ship as direct dependencies. Every other provider (@ai-sdk/anthropic, @ai-sdk/google, @ai-sdk/amazon-bedrock, @ai-sdk/mistral, @ai-sdk/cohere, @ai-sdk/groq, @ai-sdk/xai, @ai-sdk/deepseek) is declared as an optional peer and only needs to be installed when you actually use that provider. If the package is missing, smart-diff throws a clear error telling you which one to install.

Provider configuration

smart-diff is "configured" when isLlmProviderConfigured() returns true — i.e. at least one supported provider can be resolved from env vars — or you pass your own llmModelProvider factory. Otherwise summarizeGitDiff / generateSummary throw with LLM_GATEWAY_REQUIRED_MESSAGE.

Selecting a provider

LLM_PROVIDER explicitly selects a provider. When unset, the resolver auto-detects in this order: LLM_BASE_URL/OPENAI_BASE_URL → openai-compatible, OPENAI_API_KEY/LLM_API_KEY → openai, then ANTHROPIC_API_KEY, GOOGLE_GENERATIVE_AI_API_KEY (or GOOGLE_API_KEY), MISTRAL_API_KEY, COHERE_API_KEY, GROQ_API_KEY, XAI_API_KEY, DEEPSEEK_API_KEY, and finally OPENAI_DEFAULT_HEADERS/LLM_DEFAULT_HEADERS → openai.

| Provider (LLM_PROVIDER) | Package | Credential env vars | Default model | |---|---|---|---| | openai | @ai-sdk/openai | OPENAI_API_KEY or LLM_API_KEY | gpt-4o-mini | | openai-compatible | @ai-sdk/openai-compatible | LLM_BASE_URL or OPENAI_BASE_URL (required); OPENAI_API_KEY/LLM_API_KEY or custom headers | gpt-4o-mini | | anthropic | @ai-sdk/anthropic | ANTHROPIC_API_KEY | claude-3-5-haiku-latest | | google | @ai-sdk/google | GOOGLE_GENERATIVE_AI_API_KEY or GOOGLE_API_KEY | gemini-2.0-flash | | bedrock | @ai-sdk/amazon-bedrock | Standard AWS credential chain (env / profile / role) | anthropic.claude-3-5-haiku-20241022-v1:0 | | mistral | @ai-sdk/mistral | MISTRAL_API_KEY | mistral-small-latest | | cohere | @ai-sdk/cohere | COHERE_API_KEY | command-r-08-2024 | | groq | @ai-sdk/groq | GROQ_API_KEY | llama-3.1-8b-instant | | xai | @ai-sdk/xai | XAI_API_KEY | grok-2-latest | | deepseek | @ai-sdk/deepseek | DEEPSEEK_API_KEY | deepseek-chat |

LLM_* wins over OPENAI_* where both exist.

Common env vars

| Variable | Purpose | |---|---| | LLM_PROVIDER | Explicit provider id from the table above. | | LLM_MODEL | Overrides the per-provider default model id. | | OPENAI_BASE_URL / LLM_BASE_URL | Base URL for an OpenAI-compatible gateway; presence alone auto-selects the openai-compatible provider. | | OPENAI_DEFAULT_HEADERS / LLM_DEFAULT_HEADERS | JSON object of extra headers merged onto OpenAI / OpenAI-compatible requests (e.g. RBAC tokens, raw Authorization). LLM_* overrides OPENAI_* key-by-key. | | LLM_PROVIDER_NAME | Display name used when openai-compatible is active (defaults to openai-compatible). | | OPENAI_MAX_DIFF_CHARS / LLM_MAX_DIFF_CHARS | Max size of unified diff text sent to the model (default ~120k characters). | | OPENAI_MAX_TOKENS / LLM_MAX_TOKENS | Max completion tokens (default 4000). |

Example: native OpenAI

$env:OPENAI_API_KEY = "sk-..."
# Optional: $env:LLM_MODEL = "gpt-4o"

Example: Anthropic Claude

$env:ANTHROPIC_API_KEY = "sk-ant-..."
$env:LLM_MODEL = "claude-3-5-sonnet-latest"   # optional override

Example: company-managed OpenAI-compatible gateway

$env:OPENAI_BASE_URL = "https://llm-gateway.example.com"
$env:OPENAI_DEFAULT_HEADERS = '{"x-company-rbac":"your-rbac-token-here","Authorization":"Bearer sk-your-api-key-here"}'
# LLM_PROVIDER is auto-detected as "openai-compatible" because LLM_BASE_URL/OPENAI_BASE_URL is set.

Example: Google Gemini

$env:GOOGLE_GENERATIVE_AI_API_KEY = "..."
$env:LLM_MODEL = "gemini-2.0-flash"

Usage

`summarizeGitDiff`

import { summarizeGitDiff } from '@mcarvin/smart-diff';

const markdown = await summarizeGitDiff({
  from: 'origin/main',
  to: 'HEAD',
  cwd: '/path/to/repo', // optional; default process.cwd()
  includeFolders: ['src'],
  excludeFolders: ['node_modules', 'dist'],
  commitMessageExcludeRegexes: ['^\\[bot\\]'],
  commitMessageIncludeRegexes: ['^feat:'], // optional; OR across patterns
  teamName: 'Platform',
  systemPrompt: undefined,   // optional; overrides DEFAULT_GIT_DIFF_SYSTEM_PROMPT
  provider: 'anthropic',     // optional; overrides LLM_PROVIDER env + auto-detection
  model: 'claude-3-5-sonnet-latest', // optional
  maxDiffChars: 120_000,     // optional; also see LLM_MAX_DIFF_CHARS
});

| Option | Description | |--------|-------------| | from / to | Git refs for the range; to defaults to HEAD. | | cwd / git | Working tree for simple-git, or inject your own SimpleGit instance. | | includeFolders | Limit diff to these paths relative to repo root (omit for full repo minus excludes). | | excludeFolders | Excluded paths (git :(exclude) pathspecs), e.g. node_modules. | | commitMessageIncludeRegexes | If any pattern is non-empty, only commits whose full message matches at least one pattern are kept (after excludes). Case-insensitive. | | commitMessageExcludeRegexes | Drop commits whose message matches any of these patterns. | | teamName | Adds a Team: line to the user payload for the model. | | systemPrompt | Replaces the default system prompt. | | provider | LlmProviderId — wins over LLM_PROVIDER env and auto-detection. | | model | Chat model id; overrides LLM_MODEL and the provider default. | | maxDiffChars | Caps unified diff size for the request. | | contextLines | Number of context lines around each change (git diff -U<n>). Lower values (1 or 0) are the single biggest token saver on modification-heavy diffs. | | ignoreWhitespace | Passes -w / --ignore-all-space to git diff so pure-whitespace hunks don't consume tokens. Also applies to --numstat / --name-status so counts stay consistent. | | stripDiffPreamble | Removes low-value lines from the unified diff (diff --git, index, mode changes, similarity/rename/copy metadata). --- a/…, +++ b/…, and @@ hunk headers are kept. | | maxHunkLines | Caps the body of each hunk; anything past the limit is replaced with a single elision marker. The @@ header and DiffSummary totals are preserved. | | excludeDefaultNoise | Merges the built-in DEFAULT_NOISE_EXCLUDES list (lockfiles, dist, build, out, coverage, node_modules, __snapshots__) into excludeFolders. | | llmModelProvider | () => Promise<LanguageModel> — bypass env-based resolution entirely; hand-wire a Vercel AI SDK LanguageModel (required in tests or custom setups). |

Reducing tokens

For most repos, the cheapest wins are:

await summarizeGitDiff({
  from: 'origin/main',
  contextLines: 1,          // -U1 cuts 30-60% of tokens on typical diffs
  ignoreWhitespace: true,   // drop pure-whitespace hunks entirely
  stripDiffPreamble: true,  // kill `index`/`mode`/`similarity` lines
  maxHunkLines: 400,        // truncate monster hunks but keep the @@ header
  excludeDefaultNoise: true // skip lockfiles, dist/, coverage/, node_modules/
});

These options only reshape the unified diff text — the structured DiffSummary still reports true file counts and line totals, so the model always sees the full change inventory.

Injecting your own `LanguageModel`

If you want full control — for example, to configure retries, middlewares, or hit an in-process mock — pass llmModelProvider:

import { summarizeGitDiff } from '@mcarvin/smart-diff';
import { createAnthropic } from '@ai-sdk/anthropic';

const md = await summarizeGitDiff({
  from: 'origin/main',
  llmModelProvider: async () =>
    createAnthropic({ apiKey: process.env.MY_ANTHROPIC_KEY })(
      'claude-3-5-sonnet-latest',
    ),
});

Diff shape: single range vs per-commit

Single unified diff for from..to when no commit-message filters apply and the filtered commit list matches the full log for that range.
Concatenated per-commit patches (<hash>^!) when you use include/exclude regexes or when the filtered commit list differs in length from the full range (so the diff reflects only the commits that remain).

Lower-level API

The package also exports helpers for building a custom pipeline on top of the same git and LLM behavior:

Git: createGitClient, getRepoRoot, getCommits, getDiff, getDiffSummary, getChangedFiles, filterCommitsByMessageRegexes, buildDiffPathspecs, buildDiffShapingGitArgs, shapeUnifiedDiff, DEFAULT_NOISE_EXCLUDES
AI: generateSummary, resolveLlmMaxDiffChars, truncateUnifiedDiffForLlm
Provider resolution: resolveLanguageModel, detectLlmProvider, isLlmProviderConfigured, defaultModelForProvider, resolveLlmBaseUrl, parseLlmDefaultHeadersFromEnv
Constants / types: DEFAULT_GIT_DIFF_SYSTEM_PROMPT, LLM_GATEWAY_REQUIRED_MESSAGE, LlmProviderId, LlmModelProvider, ResolveLanguageModelOptions, GenerateSummaryInput, SummarizeFlags

Migrating from 1.x → 2.x

v2 replaces the direct openai SDK dependency with the Vercel AI SDK. If you only rely on env-var configuration, your setup keeps working — OPENAI_API_KEY, OPENAI_BASE_URL, OPENAI_DEFAULT_HEADERS, LLM_* equivalents, OPENAI_MAX_DIFF_CHARS, and OPENAI_MAX_TOKENS are all still honored.

Breaking changes:

Removed openAiClientProvider option on summarizeGitDiff/generateSummary. Use llmModelProvider: () => Promise<LanguageModel> returning a Vercel AI SDK model instead.
Removed OpenAiLikeClient and createOpenAiLikeClient exports, along with shouldUseLlmGateway. Use isLlmProviderConfigured() / resolveLanguageModel() instead.
openai npm package is no longer a dependency. Remove it from your own package.json if you only depended on it transitively via smart-diff.

Used By

This package is used by:

sf-git-ai-meta-insights — Salesforce metadata wrapper compatible with Salesforce DX projects

License

MIT — see LICENSE.md.