npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

sniffbench

v1.0.6

Published

A benchmark suite for coding agents. Think pytest, but for evaluating AI assistants.

Readme

Sniffbench

npm version License: MIT GitHub stars

A benchmark suite for coding agents. Think pytest, but for evaluating AI assistants.

Demo

What is this?

When you change your AI coding setup—switching models, adjusting prompts, adding MCP servers, or trying new tools—you're flying blind. Did it actually get better? Worse? Hard to say without data.

Sniffbench gives you that data. It runs your coding agent through evaluation tasks, captures your configuration, and measures what matters.

Quick Start

# Install globally
npm install -g sniffbench

# Or clone and build
git clone https://github.com/answerlayer/sniffbench.git
cd sniffbench && npm install && npm run build && npm link

# Check it's working
sniff --help
sniff doctor

Core Workflow

1. Run a Comprehension Interview

sniff interview

This runs your agent through comprehension questions about your codebase. You grade each answer (1-10) to establish baselines.

Every interview automatically:

  • Creates a run with a unique ID
  • Captures your agent configuration (version, model, MCP servers, tools)
  • Auto-links to matching variants (if registered)
# With an optional label for easy reference
sniff interview --run "baseline"

2. Register Variants for A/B Testing

Before making configuration changes, snapshot your current setup:

sniff variant register "control" --description "Stock Claude Code config"

Make your changes (add MCP server, update CLAUDE.md, etc.), then register the new config:

sniff variant register "with-linear" --description "Added Linear MCP server"

3. Compare Results

# Compare two runs
sniff compare baseline after-changes

# Or by run ID
sniff compare run-1734567890-abc123 run-1734567891-def456

Shows both config diff (what changed) and metrics diff (did it help):

Configuration Changes:
  MCP: Linear: none → stdio
  Allowed Tools: none → 1 tools

Case Comparison:
  comp-001: Tokens 10,959 → 8,234 (-25%) ✓
  comp-002: Grade 7/10 → 9/10 ↑

Aggregate Summary:
  Total tokens: 45,000 → 38,000 ↓ -15.6%
  Total cost: $0.52 → $0.44 ↓ -15.4%

Commands

sniff interview                 # Run comprehension interview
sniff variant register <name>   # Snapshot current config
sniff compare <run1> <run2>     # Compare two runs
sniff closed-issues scan        # Find repo issues to use as cases
sniff closed-issues run         # Evaluate agent on real issues

See COMMANDS.md for the full reference.

What Gets Captured

Agent Configuration (Automatic)

Every run captures:

| Field | Source | Example | |-------|--------|---------| | Agent name | CLI detection | claude-code | | Version | claude --version | 2.0.55 | | Model | API response | claude-sonnet-4-20250514 | | CLAUDE.md hash | File hash | 8b28a4e5... | | MCP servers | ~/.claude.json | Linear(stdio) | | Allowed tools | ~/.claude.json | Bash(osgrep:*) | | Permission mode | Settings | default | | Thinking mode | Settings | enabled |

Behavior Metrics (Per Case)

| Metric | What it measures | |--------|------------------| | totalTokens | Total tokens used | | inputTokens | Input/prompt tokens | | cacheReadTokens | Tokens read from cache | | cacheWriteTokens | Tokens written to cache | | toolCount | Number of tool calls | | readCount | Number of file reads | | costUsd | Estimated API cost | | explorationRatio | Read vs write tool ratio | | cacheHitRatio | Cache efficiency |

Variant System

Variants enable scientific A/B testing of agent configurations.

Why Variants?

Without variants, you're comparing runs but don't know why one performed differently. Variants let you:

  1. Document what changed: "Added Linear MCP", "Updated CLAUDE.md prompts"
  2. Auto-link runs: Runs automatically link to matching variants
  3. Compare configs: See exactly what's different between setups

Sandboxed Variant Execution

For true isolation, variants can be packaged as Docker containers with your configuration baked in:

# Register and build a container image
sniff variant register "control" -d "Stock config" --build

# Make changes to CLAUDE.md, add MCP servers, etc...

# Register the treatment variant
sniff variant register "with-osgrep" -d "Added semantic search" --build

# Run interview in sandboxed container
sniff interview --use-variant control
sniff interview --use-variant with-osgrep

# Compare the runs
sniff compare <control-run> <osgrep-run>

Each container includes:

  • Claude Code (same version as your host)
  • Your CLAUDE.md baked in
  • Tool permissions configured
  • Complete isolation from host config

Requirements: Docker must be installed for sandboxed execution.

Workflow Example (Without Containers)

# 1. Register your baseline config
sniff variant register "control" -d "Stock Claude Code"

# 2. Run some interviews
sniff interview --run "control-test-1"
sniff interview --run "control-test-2"

# 3. Make changes to your setup
# ... add MCP server, update CLAUDE.md, etc ...

# 4. Register the new config
sniff variant register "treatment" -d "Added semantic search"

# 5. Run more interviews (auto-links to "treatment")
sniff interview --run "treatment-test-1"

# 6. Compare!
sniff compare control-test-1 treatment-test-1

Storage

All data is stored in .sniffbench/ in your project root:

.sniffbench/
├── runs.json       # All runs with results and config
├── variants.json   # Registered variants
└── baselines.json  # Legacy format (auto-migrated)

Case Types

| Type | Description | Status | |------|-------------|--------| | Comprehension | Questions about codebase architecture | ✅ Ready | | Bootstrap | Common tasks (fix linting, rename symbols) | 🚧 In Progress | | Closed Issues | Real issues from your repo's history | ✅ Ready |

What We Measure

Sniffbench evaluates agents on behaviors that matter for real-world development:

  1. Comprehension - Does the agent understand the codebase?
  2. Efficiency - Does it explore without wasting tokens?
  3. Accuracy - Are its answers correct and complete?
  4. Consistency - Does it perform reliably across runs?

See VALUES.md for our full evaluation philosophy.

Contributing

We welcome contributions! Areas that need work:

  • Agent wrappers - Integrate with Cursor, Aider, or other coding agents
  • Bootstrap cases - Detection and validation for common tasks
  • LLM-judge - Automated answer quality evaluation
  • Documentation - Examples, tutorials, case studies

See CONTRIBUTING.md to get started.

Links

License

MIT - see LICENSE