yardstiq
v0.3.0
Published
Side-by-side AI model comparison CLI
Downloads
384
Maintainers
Readme
yardstiq
Compare AI model outputs side-by-side in your terminal. One prompt, multiple models, real-time streaming, performance stats, and an AI judge — all in a single command.
npx yardstiq "Explain quicksort in 3 sentences" -m claude-sonnet -m gpt-4o yardstiq — comparing 2 models
Prompt: Explain quicksort in 3 sentences
Models: Claude Sonnet vs GPT-4o
┌──────────────────────────────────┬──────────────────────────────────┐
│ Claude Sonnet ✓ │ GPT-4o ✓ │
│ │ │
│ Quicksort is a divide-and- │ Quicksort works by selecting a │
│ conquer sorting algorithm that │ "pivot" element and partitioning │
│ works by selecting a "pivot"... │ the array into two halves... │
└──────────────────────────────────┴──────────────────────────────────┘
┌────────────────────────────────────────────────────────────────────────┐
│ Performance │
│ │
│ Model Time TTFT Tokens Tok/sec Cost │
│ Claude Sonnet ⚡ 1.24s 432ms 18→86 69.4 t/s $0.0013 │
│ GPT-4o 1.89s 612ms 18→91 48.1 t/s $0.0010 │
│ │
│ Total cost: $0.0023 │
└────────────────────────────────────────────────────────────────────────┘Features
- Side-by-side streaming — Watch model outputs appear in parallel, in real time
- 40+ models — Claude, GPT, Gemini, Llama, DeepSeek, Mistral, Grok, and more
- Performance stats — Time, TTFT, token counts, throughput, and cost per model
- AI judge — Let an AI evaluate which response is best with scored verdicts
- Multiple export formats — JSON, Markdown, and self-contained HTML reports
- Benchmarks — Run YAML-defined prompt suites across models with aggregate scoring
- History — Save and revisit past comparisons
- Local models — Compare against Ollama models with zero API cost
- Flexible auth — AI Gateway for one-key access, or individual provider keys
Install
# npm
npm install -g yardstiq
# pnpm
pnpm add -g yardstiq
# npx (no install)
npx yardstiq "your prompt" -m claude-sonnet -m gpt-4oFrom source
git clone https://github.com/stanleycyang/aidiff.git
cd aidiff
pnpm install
pnpm build
node dist/index.js --helpSetup
yardstiq needs API keys to call models. Choose one or both options:
Option A: AI Gateway (recommended)
One key for 40+ models from every provider through the Vercel AI Gateway — no markup on token pricing.
export AI_GATEWAY_API_KEY=your_gateway_keyGet your key at vercel.com/ai-gateway.
Option B: Individual provider keys
Set keys for the providers you want to use:
export ANTHROPIC_API_KEY=sk-ant-... # Claude models
export OPENAI_API_KEY=sk-... # GPT models
export GOOGLE_GENERATIVE_AI_API_KEY=... # Gemini modelsTip: If you have
AI_GATEWAY_API_KEYset, yardstiq will fall back to the gateway when a direct provider key is missing. You can mix both approaches.
You can also store keys persistently:
yardstiq config set gateway-key your_key
yardstiq config set anthropic-key sk-ant-...
yardstiq config set openai-key sk-...
yardstiq config set google-key your_keyLocal models (Ollama)
No API key needed. Just have Ollama running:
yardstiq "hello" -m local:llama3.2 -m local:mistralUsage
Basic comparison
yardstiq "Write a Python fibonacci function" -m claude-sonnet -m gpt-4oCompare 3+ models
yardstiq "Explain monads simply" -m claude-sonnet -m gpt-4o -m gemini-flashUse any model via AI Gateway
With AI_GATEWAY_API_KEY set, use provider/model format to access any model:
yardstiq "Hello" -m anthropic/claude-sonnet-4.6 -m openai/gpt-4o -m xai/grok-3Pipe from stdin
echo "Explain the CAP theorem" | yardstiq -m claude-sonnet -m gpt-4o
cat prompt.txt | yardstiq -m claude-haiku -m gpt-4o-miniRead prompt from file
yardstiq -f ./prompt.txt -m claude-sonnet -m gpt-4oAdd a system prompt
yardstiq "Review this code" -s "You are an expert code reviewer" -m claude-sonnet -m gpt-4oAI judge
Let an AI evaluate which response is better:
yardstiq "Write a sorting algorithm" -m claude-sonnet -m gpt-4o --judgeUse a specific model as judge with custom criteria:
yardstiq "Explain DNS" -m claude-sonnet -m gpt-4o \
--judge --judge-model gpt-4.1 \
--judge-criteria "Focus on accuracy and beginner-friendliness"Export results
# JSON (for scripting)
yardstiq "hello" -m claude-sonnet -m gpt-4o --json > results.json
# Markdown
yardstiq "hello" -m claude-sonnet -m gpt-4o --markdown > comparison.md
# HTML (self-contained, dark theme)
yardstiq "hello" -m claude-sonnet -m gpt-4o --html > comparison.htmlSave and review later
yardstiq "Explain quicksort" -m claude-sonnet -m gpt-4o --save quicksort
yardstiq history list
yardstiq history show quicksortTune parameters
yardstiq "Be creative" -m claude-sonnet -m gpt-4o \
-t 0.8 \ # temperature
--max-tokens 4096 \ # max output length
--timeout 120 # seconds per modelDisable streaming
yardstiq "hello" -m claude-sonnet -m gpt-4o --no-streamModels
Run yardstiq models to see all 40 built-in models with pricing and access status.
| Provider | Models | Aliases |
|----------|--------|---------|
| Anthropic | Claude Sonnet 4.6, Haiku 4.5, Opus 4.6, 3.5 Sonnet | claude-sonnet, claude-haiku, claude-opus, claude-3.5-sonnet |
| OpenAI | GPT-4o, 4o Mini, 4.1, 4.1 Mini/Nano, 5, 5 Mini/Nano, o3-mini, Codex Mini | gpt-4o, gpt-4o-mini, gpt-4.1, gpt-4.1-mini, gpt-4.1-nano, gpt-5, gpt-5-mini, gpt-5-nano, o3-mini, codex-mini |
| Google | Gemini 2.5 Pro/Flash/Flash Lite, 3 Flash/Pro | gemini-pro, gemini-flash, gemini-flash-lite, gemini-3-flash, gemini-3-pro |
| DeepSeek | V3.2, R1 | deepseek, deepseek-r1 |
| Mistral | Large 3, Magistral Medium/Small, Codestral | mistral-large, magistral-medium, magistral-small, codestral |
| Meta | Llama 4 Maverick/Scout, 3.3 70B | llama-4-maverick, llama-4-scout, llama-3.3-70b |
| xAI | Grok 3 | grok-3 |
| Amazon | Nova Pro, Nova Lite | nova-pro, nova-lite |
| Cohere | Command A | command-a |
| Alibaba | Qwen 3.5 Flash/Plus | qwen3.5-flash, qwen3.5-plus |
| Moonshot | Kimi K2, K2.5 | kimi-k2, kimi-k2.5 |
| MiniMax | M2.5 | minimax-m2.5 |
Status key: ✓ key = direct API key configured, ✓ gw = available via AI Gateway, ✗ = no access
Model formats
| Format | Example | Description |
|--------|---------|-------------|
| Alias | claude-sonnet | Built-in shorthand for popular models |
| Gateway | openai/gpt-5.2 | Any model via AI Gateway (provider/model) |
| Local | local:llama3.2 | Ollama models |
CLI Reference
Usage: yardstiq [options] [command] [prompt...]
Compare AI model outputs side-by-side in your terminal
Arguments:
prompt The prompt to send to all models
Options:
-V, --version output the version number
-m, --model <models...> Models to compare (at least 2)
-s, --system <message> System prompt for all models
-f, --file <path> Read prompt from file
-t, --temperature <n> Temperature (default: 0)
--max-tokens <n> Max tokens per response (default: 2048)
--judge Use AI judge to evaluate responses
--judge-model <model> Model for judging (default: "claude-sonnet")
--judge-criteria <text> Custom judging criteria
--no-stream Disable streaming
--json Output as JSON
--markdown Output as Markdown
--html Output as HTML
--save [name] Save results to history
--timeout <seconds> Timeout per model (default: 60)
-v, --verbose Show debug info
-h, --help display help for command
Commands:
models List available models and pricing
history [action] [name] Browse saved comparisons
config <action> [key] [value] Manage configuration
bench [options] <file> Run a benchmark suiteDevelopment
git clone https://github.com/stanleycyang/aidiff.git
cd aidiff
pnpm install
pnpm build # Build with tsup
pnpm dev # Watch mode
pnpm test # Run tests
pnpm test:coverage # Run tests with 100% coverage enforcement
pnpm typecheck # Type check
pnpm lint # Lint with BiomeContributing
- Fork the repo
- Create a feature branch (
git checkout -b feat/my-feature) - Make your changes with tests
- Ensure
pnpm test:coveragepasses at 100% - Submit a pull request
