npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

claimcheck

v0.2.0

Published

Does a Dafny lemma actually mean what a natural language requirement says? Dafny can verify proofs, but it can't verify meaning. Claimcheck fills that gap.

Readme

claimcheck

Does a Dafny lemma actually mean what a natural language requirement says? Dafny can verify proofs, but it can't verify meaning. Claimcheck fills that gap.

Someone else (Claude Code, a human, any agent) writes the lemmas and claims "requirement X is covered by lemma Y." Claimcheck verifies that claim via a round-trip: informalize the lemma back to English (without seeing the requirement), then compare.

Blog post: claimcheck: Narrowing the Gap between Proof and Intent.

Installation

npm install
npm link

Modes

Two-pass mode (default)

Structural separation — different models for informalization and comparison:

1. Extract mapped lemmas from claims .dfy file
2. Batch informalize all lemmas (haiku) — does NOT see requirements
3. Batch compare back-translations against requirements (sonnet)
4. Report: confirmed / disputed

Single-prompt mode (--single-prompt)

Prompt-level separation — one model does both passes sequentially:

1. Extract mapped lemmas from claims .dfy file
2. For each mapping: single LLM call with two-pass prompt
   a. Pass 1: informalize the lemma (before seeing the NL requirement)
   b. Pass 2: compare, check vacuity, flag surprising restrictions
3. Report with richer verdicts: JUSTIFIED / PARTIALLY_JUSTIFIED / NOT_JUSTIFIED / VACUOUS

Claude Code benchmark (eval/bench-cc.js)

Same single-prompt approach piped through claude -p — no structural or prompt-level separation (the model sees everything at once). Useful for comparing whether look-ahead matters.

Usage

File mode (extract lemmas from .dfy)

# Two-pass audit (default)
claimcheck \
  -m test/integration/mappings/counter.json \
  --dfy test/integration/claims/counter.dfy \
  -d counter

# With explicit module (needed for --verify with module-based .dfy files)
claimcheck \
  -m test/integration/mappings/counter.json \
  --dfy test/integration/claims/counter.dfy \
  --module CounterDomain -d counter --verify

# Single-prompt audit
claimcheck \
  -m test/integration/mappings/counter.json \
  --dfy test/integration/claims/counter.dfy \
  -d counter --single-prompt --json

Stdin mode (pure JSON-in/JSON-out)

Pre-extract your claims and pipe them in:

echo '{
  "claims": [
    {
      "requirement": "The counter value is always non-negative",
      "lemmaName": "CounterNonNegative",
      "dafnyCode": "lemma CounterNonNegative(m: int)\n  requires Inv(m)\n  ensures m >= 0\n{}"
    }
  ],
  "domain": "counter"
}' | claimcheck --stdin

Library usage

import { claimcheck } from 'claimcheck';

const { results, tokenUsage } = await claimcheck({
  claims: [
    {
      requirement: 'The counter value is always non-negative',
      lemmaName: 'CounterNonNegative',
      dafnyCode: 'lemma CounterNonNegative(m: int)\n  requires Inv(m)\n  ensures m >= 0\n{}',
    },
  ],
  domain: 'counter',
});

Tests and benchmarks

# All test projects
node test/integration/run-all.js

# Single test project
node test/integration/run-all.js counter

# Run benchmarks
node eval/bench.js --runs 3 --label two-pass
node eval/bench.js --runs 3 --label single-prompt --single-prompt
node eval/bench-cc.js --runs 1 --label cc-sonnet

# Run a single domain or lemma
node eval/bench-cc.js --runs 1 --label test --domain counter
node eval/bench-cc.js --runs 1 --label test --domain counter --lemma CounterNonNegative

# Compare results
node eval/compare.js two-pass single-prompt
node eval/compare.js two-pass cc-sonnet

Options

| Flag | Description | |------|-------------| | -m, --mapping <path> | Path to mapping file (JSON: [{requirement, lemmaName}, ...]) | | --dfy <path> | Path to claims .dfy file (contains the lemmas) | | --module <name> | Dafny module name (optional; needed for --verify with module-based files) | | -d, --domain <name> | Human-readable domain name (default: from --module or .dfy filename) | | --json | Output JSON instead of markdown | | --single-prompt | Use single-prompt claimcheck mode (one call per pair) | | --model <id> | Model for single-prompt mode (default: sonnet) | | --verify | Also run dafny verify on each lemma | | --informalize-model <id> | Model for back-translation in two-pass mode (default: haiku) | | --compare-model <id> | Model for comparison in two-pass mode (default: sonnet) | | --stdin | Read JSON from stdin (pure claimcheck, no file extraction) | | -v, --verbose | Verbose API/verification logging |

Output

For each mapping entry, one of:

| Status | Meaning | |--------|---------| | confirmed | Round-trip passed — lemma faithfully expresses the requirement | | disputed | Round-trip failed — discrepancy between lemma meaning and requirement | | verify-failed | Dafny verification failed (only with --verify flag) | | error | Lemma not found in source |

In single-prompt mode, disputed results include richer detail: verdict category, vacuity analysis, and surprising restrictions.

Test Projects

| Project | Claims file | Domain module | |---------|------------|---------------| | counter | test/integration/claims/counter.dfy | CounterDomain | | kanban | test/integration/claims/kanban.dfy | KanbanDomain | | colorwheel | test/integration/claims/colorwheel.dfy | ColorWheelDomain | | canon | test/integration/claims/canon.dfy | CanonDomain | | delegation-auth | test/integration/claims/delegation-auth.dfy | DelegationAuthDomain |

Each claims file includes its domain from ../dafny-replay/. Mappings in test/integration/mappings/.

Benchmark Results

Accuracy across 5 domains (counter, kanban, colorwheel, canon, delegation-auth) with 36 requirement-lemma pairs, including 8 deliberately bogus lemmas (tautologies, weakened postconditions, narrowed scope):

| Variant | Accuracy | Time/run | API calls/run | |---------|----------|----------|---------------| | Two-pass (default) | 96.3% (104/108) | ~108s | 2 (batch informalize + batch compare) | | Single-prompt | 86.1% (31/36) | ~353s | 36 (one per pair) | | Claude Code (bench-cc) | 69.4% (25/36) | ~693s | 36 (one claude -p per pair) |

Two-pass had 3 runs; single-prompt and Claude Code had 1 run each.

Key takeaways:

  • Structural separation (two-pass) is both the most accurate and fastest
  • The informalize-without-seeing-requirement step prevents anchoring bias
  • Batching into 2 API calls vs 36 individual calls gives a ~3x speed advantage
  • Claude Code's general-purpose system prompt and lack of structured output hurt both accuracy and speed

Requirements

  • Node.js 18+
  • ANTHROPIC_API_KEY environment variable
  • dafny in PATH (only for --verify)
  • dafny-replay cloned as a sibling directory (required for tests and benchmarks — the claims .dfy files include domain files from ../../dafny-replay/)