npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

yardstiq

v0.3.0

Published

Side-by-side AI model comparison CLI

Downloads

384

Readme

CI npm version npm downloads Node.js License: MIT

yardstiq

Compare AI model outputs side-by-side in your terminal. One prompt, multiple models, real-time streaming, performance stats, and an AI judge — all in a single command.

npx yardstiq "Explain quicksort in 3 sentences" -m claude-sonnet -m gpt-4o
 yardstiq — comparing 2 models

 Prompt: Explain quicksort in 3 sentences
 Models: Claude Sonnet vs GPT-4o

 ┌──────────────────────────────────┬──────────────────────────────────┐
 │ Claude Sonnet ✓                  │ GPT-4o ✓                         │
 │                                  │                                  │
 │ Quicksort is a divide-and-       │ Quicksort works by selecting a   │
 │ conquer sorting algorithm that   │ "pivot" element and partitioning │
 │ works by selecting a "pivot"...  │ the array into two halves...     │
 └──────────────────────────────────┴──────────────────────────────────┘

 ┌────────────────────────────────────────────────────────────────────────┐
 │ Performance                                                           │
 │                                                                       │
 │ Model              Time     TTFT     Tokens     Tok/sec   Cost        │
 │ Claude Sonnet ⚡   1.24s    432ms    18→86      69.4 t/s  $0.0013     │
 │ GPT-4o             1.89s    612ms    18→91      48.1 t/s  $0.0010     │
 │                                                                       │
 │ Total cost: $0.0023                                                   │
 └────────────────────────────────────────────────────────────────────────┘

Features

  • Side-by-side streaming — Watch model outputs appear in parallel, in real time
  • 40+ models — Claude, GPT, Gemini, Llama, DeepSeek, Mistral, Grok, and more
  • Performance stats — Time, TTFT, token counts, throughput, and cost per model
  • AI judge — Let an AI evaluate which response is best with scored verdicts
  • Multiple export formats — JSON, Markdown, and self-contained HTML reports
  • Benchmarks — Run YAML-defined prompt suites across models with aggregate scoring
  • History — Save and revisit past comparisons
  • Local models — Compare against Ollama models with zero API cost
  • Flexible auth — AI Gateway for one-key access, or individual provider keys

Install

# npm
npm install -g yardstiq

# pnpm
pnpm add -g yardstiq

# npx (no install)
npx yardstiq "your prompt" -m claude-sonnet -m gpt-4o

From source

git clone https://github.com/stanleycyang/aidiff.git
cd aidiff
pnpm install
pnpm build
node dist/index.js --help

Setup

yardstiq needs API keys to call models. Choose one or both options:

Option A: AI Gateway (recommended)

One key for 40+ models from every provider through the Vercel AI Gateway — no markup on token pricing.

export AI_GATEWAY_API_KEY=your_gateway_key

Get your key at vercel.com/ai-gateway.

Option B: Individual provider keys

Set keys for the providers you want to use:

export ANTHROPIC_API_KEY=sk-ant-...      # Claude models
export OPENAI_API_KEY=sk-...             # GPT models
export GOOGLE_GENERATIVE_AI_API_KEY=...  # Gemini models

Tip: If you have AI_GATEWAY_API_KEY set, yardstiq will fall back to the gateway when a direct provider key is missing. You can mix both approaches.

You can also store keys persistently:

yardstiq config set gateway-key your_key
yardstiq config set anthropic-key sk-ant-...
yardstiq config set openai-key sk-...
yardstiq config set google-key your_key

Local models (Ollama)

No API key needed. Just have Ollama running:

yardstiq "hello" -m local:llama3.2 -m local:mistral

Usage

Basic comparison

yardstiq "Write a Python fibonacci function" -m claude-sonnet -m gpt-4o

Compare 3+ models

yardstiq "Explain monads simply" -m claude-sonnet -m gpt-4o -m gemini-flash

Use any model via AI Gateway

With AI_GATEWAY_API_KEY set, use provider/model format to access any model:

yardstiq "Hello" -m anthropic/claude-sonnet-4.6 -m openai/gpt-4o -m xai/grok-3

Pipe from stdin

echo "Explain the CAP theorem" | yardstiq -m claude-sonnet -m gpt-4o
cat prompt.txt | yardstiq -m claude-haiku -m gpt-4o-mini

Read prompt from file

yardstiq -f ./prompt.txt -m claude-sonnet -m gpt-4o

Add a system prompt

yardstiq "Review this code" -s "You are an expert code reviewer" -m claude-sonnet -m gpt-4o

AI judge

Let an AI evaluate which response is better:

yardstiq "Write a sorting algorithm" -m claude-sonnet -m gpt-4o --judge

Use a specific model as judge with custom criteria:

yardstiq "Explain DNS" -m claude-sonnet -m gpt-4o \
  --judge --judge-model gpt-4.1 \
  --judge-criteria "Focus on accuracy and beginner-friendliness"

Export results

# JSON (for scripting)
yardstiq "hello" -m claude-sonnet -m gpt-4o --json > results.json

# Markdown
yardstiq "hello" -m claude-sonnet -m gpt-4o --markdown > comparison.md

# HTML (self-contained, dark theme)
yardstiq "hello" -m claude-sonnet -m gpt-4o --html > comparison.html

Save and review later

yardstiq "Explain quicksort" -m claude-sonnet -m gpt-4o --save quicksort
yardstiq history list
yardstiq history show quicksort

Tune parameters

yardstiq "Be creative" -m claude-sonnet -m gpt-4o \
  -t 0.8 \              # temperature
  --max-tokens 4096 \   # max output length
  --timeout 120          # seconds per model

Disable streaming

yardstiq "hello" -m claude-sonnet -m gpt-4o --no-stream

Models

Run yardstiq models to see all 40 built-in models with pricing and access status.

| Provider | Models | Aliases | |----------|--------|---------| | Anthropic | Claude Sonnet 4.6, Haiku 4.5, Opus 4.6, 3.5 Sonnet | claude-sonnet, claude-haiku, claude-opus, claude-3.5-sonnet | | OpenAI | GPT-4o, 4o Mini, 4.1, 4.1 Mini/Nano, 5, 5 Mini/Nano, o3-mini, Codex Mini | gpt-4o, gpt-4o-mini, gpt-4.1, gpt-4.1-mini, gpt-4.1-nano, gpt-5, gpt-5-mini, gpt-5-nano, o3-mini, codex-mini | | Google | Gemini 2.5 Pro/Flash/Flash Lite, 3 Flash/Pro | gemini-pro, gemini-flash, gemini-flash-lite, gemini-3-flash, gemini-3-pro | | DeepSeek | V3.2, R1 | deepseek, deepseek-r1 | | Mistral | Large 3, Magistral Medium/Small, Codestral | mistral-large, magistral-medium, magistral-small, codestral | | Meta | Llama 4 Maverick/Scout, 3.3 70B | llama-4-maverick, llama-4-scout, llama-3.3-70b | | xAI | Grok 3 | grok-3 | | Amazon | Nova Pro, Nova Lite | nova-pro, nova-lite | | Cohere | Command A | command-a | | Alibaba | Qwen 3.5 Flash/Plus | qwen3.5-flash, qwen3.5-plus | | Moonshot | Kimi K2, K2.5 | kimi-k2, kimi-k2.5 | | MiniMax | M2.5 | minimax-m2.5 |

Status key: ✓ key = direct API key configured, ✓ gw = available via AI Gateway, = no access

Model formats

| Format | Example | Description | |--------|---------|-------------| | Alias | claude-sonnet | Built-in shorthand for popular models | | Gateway | openai/gpt-5.2 | Any model via AI Gateway (provider/model) | | Local | local:llama3.2 | Ollama models |

CLI Reference

Usage: yardstiq [options] [command] [prompt...]

Compare AI model outputs side-by-side in your terminal

Arguments:
  prompt                         The prompt to send to all models

Options:
  -V, --version                  output the version number
  -m, --model <models...>        Models to compare (at least 2)
  -s, --system <message>         System prompt for all models
  -f, --file <path>              Read prompt from file
  -t, --temperature <n>          Temperature (default: 0)
  --max-tokens <n>               Max tokens per response (default: 2048)
  --judge                        Use AI judge to evaluate responses
  --judge-model <model>          Model for judging (default: "claude-sonnet")
  --judge-criteria <text>        Custom judging criteria
  --no-stream                    Disable streaming
  --json                         Output as JSON
  --markdown                     Output as Markdown
  --html                         Output as HTML
  --save [name]                  Save results to history
  --timeout <seconds>            Timeout per model (default: 60)
  -v, --verbose                  Show debug info
  -h, --help                     display help for command

Commands:
  models                         List available models and pricing
  history [action] [name]        Browse saved comparisons
  config <action> [key] [value]  Manage configuration
  bench [options] <file>          Run a benchmark suite

Development

git clone https://github.com/stanleycyang/aidiff.git
cd aidiff
pnpm install
pnpm build           # Build with tsup
pnpm dev             # Watch mode
pnpm test            # Run tests
pnpm test:coverage   # Run tests with 100% coverage enforcement
pnpm typecheck       # Type check
pnpm lint            # Lint with Biome

Contributing

  1. Fork the repo
  2. Create a feature branch (git checkout -b feat/my-feature)
  3. Make your changes with tests
  4. Ensure pnpm test:coverage passes at 100%
  5. Submit a pull request

License

MIT