@prompt-diff/core
v0.1.2
Published
Core engine for Prompt-Diff — providers, runDiff, runSuite, YAML eval config.
Downloads
31
Maintainers
Readme
Prompt-Diff
One prompt, many models — compare quality, speed, and cost.
npm CLI npm core License: MIT Node.js Live demo GitHub
Live demo → · Quick start · Web UI · CLI · Providers · Eval suites · Architecture · npm packages
Prompt-Diff runs one prompt against many LLMs and lines up answers, latency, tokens, and cost in a CLI (npm package **@prompt-diff/cli, command **prompt-diff when installed globally) and a Next.js web UI — so you can compare providers with evidence instead of juggling tabs and copy-paste.
npx @prompt-diff/cli "Explain the CAP theorem in one paragraph" --models claude,ollamaWorks on macOS, Linux, and Windows with Node.js 18+.
Table of contents
- Why Prompt-Diff?
- Features
- Quick start
- Providers
- Configuration
- Eval suites (YAML)
- Web UI
- CLI usage
- Architecture
- Contributing
- npm packages (blog)
- License
Why Prompt-Diff?
Picking the right model shouldn't mean mentally mapping which output came from where. Prompt-Diff keeps every model's answer and metrics in one place so you can decide with data.
Tip: Use the CLI in CI and scripts (
--output json). Use the web app when you want a polished compare view, YAML test suites, and judge-backed rubrics — without restarting the server when you change models.
Features
| | |
| ------------------------ | ---------------------------------------------------------------------------------------------------------------------------------- |
| Side-by-side compare | Same prompt, every enabled model — outputs, errors, and metrics in one grid. |
| YAML eval suites | Prompt templates × variable matrices × assertions (contains, latency, cost, llm-rubric). |
| Live suite logs | Streamed run log in the web UI so you see each LLM and judge call as it happens. |
| OpenAI model list | With an API key, the UI loads chat models from OpenAI's /v1/models (plus presets & "Other"). |
| Secrets & judge | Web settings for secret variables, Anthropic/Ollama judge, and YAML import/export. |
| CLI + core library | npx @prompt-diff/cli (or npm i -g @prompt-diff/cli then prompt-diff); @prompt-diff/core for programmatic diffs and suites. |
Quick start
CLI — zero install
ANTHROPIC_API_KEY=sk-... npx @prompt-diff/cli "What is LoRA?"
npx @prompt-diff/cli "Review this function" --file ./utils.py --models claude,ollama
# Average latency over 5 runs
npx @prompt-diff/cli "Summarize this" --runs 5 --output jsonWeb UI — hosted
Open https://prompt-diff-oss.vercel.app/. Add API keys under Settings in the browser; test suites live at /suite.
Web UI — local dev
git clone https://github.com/darkrishabh/prompt-diff
cd prompt-diff
npm install
npm run devThen open http://localhost:3000 (or 3001 if 3000 is busy).
Note: Suite streaming and eval need a Node deployment (not
output: 'export'). The suite API sets a longmaxDurationfor hosts like Vercel; very heavy runs may still need a higher limit or a long-lived server.
Deploying on Vercel
Required settings or you'll get a plain NOT_FOUND on *.vercel.app:
- Root Directory → set to
packages/web(not.and not empty). - Build Command → leave empty (uses
packages/web/vercel.json: npm run build) or set explicitly tonpm run build. Do not usenext buildonly — it skips compiling@prompt-diff/core. - Install → default
npm installfrom the repository root is correct for npm workspaces. - Include files outside Root Directory → leave enabled so
packages/coreis visible during the build.
packages/web/next.config.ts sets outputFileTracingRoot to the monorepo root so API routes bundle correctly.
Providers
Cloud APIs
| Provider | Env var | Notes |
| -------------- | -------------------------------------- | ------------------------------- |
| Claude | ANTHROPIC_API_KEY | Haiku, Sonnet, Opus |
| OpenAI | OPENAI_API_KEY | Full list in UI when key is set |
| Groq | GROQ_API_KEY | Very fast inference |
| OpenRouter | OPENROUTER_API_KEY | Many models, one key |
| Together | TOGETHER_API_KEY | Open-weight models |
| NVIDIA NIM | NVIDIA_NIM_API_KEY | NIM endpoints |
| Perplexity | PERPLEXITY_API_KEY | Search-grounded |
| Minimax | MINIMAX_API_KEY + MINIMAX_GROUP_ID | API + group ID |
| Custom | — | Any OpenAI-compatible base URL |
Local & CLI
| Provider | Requirements |
| -------------- | ------------------------------------------------------------------------ |
| Ollama | ollama.ai — local tags discovered via /api/models |
| Claude CLI | @anthropic-ai/claude-code on PATH |
| Codex CLI | @openai/codex on PATH |
| LM Studio | OpenAI-compatible server (e.g. localhost:1234) via Custom |
Configuration
Copy .env.example to .env.local for the web app, or export vars in your shell for the CLI.
ANTHROPIC_API_KEY=sk-ant-...
OLLAMA_BASE_URL=http://localhost:11434 # optional
OPENAI_API_KEY=sk-...
GROQ_API_KEY=gsk_...
OPENROUTER_API_KEY=sk-or-...
TOGETHER_API_KEY=...
NVIDIA_NIM_API_KEY=nvapi-...
PERPLEXITY_API_KEY=pplx-...
MINIMAX_API_KEY=...
MINIMAX_GROUP_ID=...Eval suites (YAML)
Define prompt templates, test rows (vars), and assertions: contains, not-contains, latency, cost, and llm-rubric (needs a judge — Claude when a key is available, or --judge ollama / none).
Full example: [examples/prompt-diff.yaml](examples/prompt-diff.yaml)
npx @prompt-diff/cli run --config examples/prompt-diff.yaml --models claude,ollama,minimax
npx @prompt-diff/cli run --config examples/prompt-diff.yaml --output json --fail-on-error
npx @prompt-diff/cli run --config examples/prompt-diff.yaml --judge noneWith a global install (npm i -g @prompt-diff/cli), use prompt-diff instead of npx @prompt-diff/cli.
The web app runs the same engine at POST /api/suite with SSE live logs when stream: true.
Web UI
| Capability | Description |
| -------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------- |
| Run workspace | Prompt card, colored model chips, + add model, Run, then Responses / Compare & evaluate / History |
| Responses | Grid (wrapping cards, 4+ models), Side-by-side (horizontal scroll), or Diff (line-level LCS between two outputs) |
| Model cards | Provider label, model ID, highlight pills (fastest / slowest / cheapest / best rated), 3-column metrics, markdown body, star rating + Copy |
| Quick comparison | Sticky footer mini-bars for latency, output tokens, and cost; Full compare jumps to the evaluate tab |
| History | Last runs stored in localStorage; click an entry to reload prompt + results |
| Test suites | /suite — YAML editor, run target banner, judge summary, live log, matrix results, recent runs (last 15, browser localStorage) |
| Settings | Models, secrets, judge, YAML import/export — stored in localStorage |
| API routes | /api/diff, /api/suite, /api/models (Ollama GET, OpenAI POST) |
CLI usage
The on-disk command is **prompt-diff** after npm i -g @prompt-diff/cli; with npx, call **npx @prompt-diff/cli** instead.
Usage: prompt-diff <prompt> [options]
Arguments:
prompt Prompt to send to all providers
Options:
--file <path> Append file contents to the prompt
--models <list> Comma-separated providers (default: "claude,ollama")
--runs <n> Runs for latency averaging (default: 1)
--output <format> pretty | json (default: "pretty")
-V, --version Show version
-h, --help Show helpprompt-diff "Implement binary search in Python" --models claude,ollama
prompt-diff "Hello" --models groq,claude --runs 10 --output json | jq '.results[].latencyMs'
prompt-diff "Find bugs" --file ./server.ts
prompt-diff "Explain recursion" --models claude-cli,codexArchitecture
flowchart LR
subgraph clients [Clients]
CLI[CLI / Ink]
WEB[Next.js UI]
end
subgraph pkg [packages]
CORE["@prompt-diff/core\nrunDiff · runSuite · providers"]
API[API routes]
end
CLI --> CORE
WEB --> API
API --> CORE
CORE --> P1[Claude / Ollama / OpenAI-compat …]| Package | Role |
| ----------------------------------- | ------------------------------------------------------------------ |
| packages/core | Provider interface, runDiff, runSuite, YAML parsing, pricing |
| packages/cli (@prompt-diff/cli) | Commander + terminal UI; **prompt-diff** binary |
| packages/web | Next.js App Router, streaming suite API, model discovery proxy |
Adding a provider is on the order of tens of lines: implement Provider in core and wire it in the web API (and CLI config if needed). OpenAICompatibleProvider covers most REST APIs; subprocess adapters cover local CLIs.
Contributing
git clone https://github.com/darkrishabh/prompt-diff.git
cd prompt-diff
npm install
npm run dev # turbo: CLI watch + Next dev
npm run build
npm run type-checkIf your local origin still uses the old repository name:
git remote set-url origin https://github.com/darkrishabh/prompt-diff.gitIdeas that move the needle: new providers (Gemini, Bedrock, Azure OpenAI), richer diff UX, terminal markdown, tighter CI eval stories.
License
MIT — see LICENSE.
Built by @darkrishabh
