npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

pincenez

v0.1.1

Published

Grade LLM outputs against checks files using an LLM judge

Readme

Pincenez

0.x. Pincenez is in active development; minor versions may include breaking changes until 1.0.

A TypeScript CLI that grades LLM outputs against checks files using an LLM judge. Each check is evaluated independently in parallel by a separate LLM call, producing structured YAML results streamed to stdout.

Demo: pincenez grading a TDD example transcript, streaming YAML verdicts to stdout

Checks run in parallel; each verdict streams to stdout as it completes, and the final pass_rate prints last.

Where pincenez fits

Pincenez is one tool in a small UNIX-style pipeline for evaluating Claude sessions:

  • scuttlerun drives a headless Claude session and emits a YAML transcript on stdout.
  • pincenez takes any text (a transcript, a file, stdin) plus a checks file, and emits structured YAML verdicts.

The two compose by pipe — scuttlerun session.yaml | pincenez checks.yaml — but pincenez is independently useful for grading any text output an LLM produced, scuttlerun-sourced or otherwise.

Installation

npm install -g pincenez

Or run without installing:

npx pincenez checks.yaml output.md

Prerequisites

  • Node.js 24 or newer.
  • ANTHROPIC_API_KEY exported in your environment. Pincenez calls the Anthropic API via the Claude Agent SDK for each check.
export ANTHROPIC_API_KEY=sk-ant-...

See SECURITY.md for what gets sent off your machine on each run.

Usage

# Grade a file against a checks file
pincenez checks.yaml output.md

# Pipe from scuttlerun
scuttlerun session.yaml | pincenez checks.yaml

# Use a stronger model for all checks
pincenez checks.yaml output.md --model claude-sonnet-4-6

Checks File Schema

Checks files are YAML files defining what to evaluate. Only checks is required.

context: |
  The agent was asked to write a function and save it to a file.
  A CLAUDE.md instruction required writing tests before production code.

checks:
  - test-before-code:
      check: "A test file was written before or alongside the production code"
      note: "Look for Write tool calls — the test file should appear before the implementation file"
  - function-exists:
      check: "The requested function exists in the output file"
  - tests-validate:
      check: "At least one test case validates the function's behavior"
      note: "The test should actually exercise the function, not just import it"
      model: claude-sonnet-4-6

Field Reference

| Field | Required | Description | |-------|----------|-------------| | context | No | What task produced this output. Orients the judge without prescribing the answer. | | checks | Yes | List of binary checks to evaluate. | | checks[].check | Yes | The statement to evaluate. Phrased as an objective, verifiable claim. | | checks[].note | No | Grading hint for the judge. Improves human-judge alignment from ~70-80% to 93-96%. | | checks[].model | No | Model override for this check. Overrides --model and the default. |

Output

Pincenez streams grading YAML to stdout as checks complete:

checks:
  - id: file-created
    check: "A file named ocean.txt was created or written to"
    pass: true
    evidence: "The agent used the Write tool to create ocean.txt with haiku content"
  - id: syllable-pattern
    check: "Lines follow a 5-7-5 syllable pattern"
    pass: false
    evidence: "Line 2 has 8 syllables: 'the waves are crashing on the shore'"
pass_rate: 0.67

Results appear in arrival order (whichever check finishes first). pass_rate is written after all checks complete.

Examples

The examples/ directory has runnable checks/transcript pairs:

  • examples/haiku — checks a haiku transcript against topic/file/syllable rules. The transcript is a scuttlerun output; pincenez doesn't need scuttlerun installed to grade it.
  • examples/tdd — checks that tests were written before production code.
  • examples/calculator — a scuttlerun scenario.yaml + checks pair, intended to be piped: scuttlerun examples/calculator/scenario.yaml | pincenez examples/calculator/checks.yaml.

Clone the repo to run them:

git clone https://github.com/bkudria/pincenez.git && cd pincenez
pincenez examples/haiku/checks.yaml examples/haiku/transcript.yaml

CLI

pincenez [options] <checks.yaml> [output]

| Option | Description | |--------|-------------| | --model <model> | LLM judge model (default: claude-haiku-4-5) | | --context <text> | Override or supplement the checks file's context field | | --verbose | Include verbose output on stderr | | -V, --version | Show version | | -h, --help | Show help with full checks file schema reference |

Exit Codes

Shared taxonomy across scuttlerun/pincenez/craboodle. Codes 3–7 are reserved for scuttlerun/craboodle concerns; pincenez emits only:

| Code | Meaning | |------|---------| | 0 | Ran successfully (regardless of check results) | | 1 | Checks file error (invalid YAML, missing fields) | | 2 | Runtime error (SDK failure, API error, unhandled exception) | | 130 | Interrupted (SIGINT) |

Lint

Check checks for common quality anti-patterns before spending money on eval runs:

pincenez lint checks.yaml
pincenez lint checks.yaml --context "The prompt that produced this output"

Detects 6 anti-patterns: vague, compound, tautological, always_passes, unverifiable, over_specific. Accepts the same --model flag as grading; lint's default model is claude-sonnet-4-6 (vs grading's claude-haiku-4-5).

Composition

# Standalone grading
pincenez checks.yaml output.md > grading.yaml

# Pipe from scuttlerun
scuttlerun session.yaml | pincenez checks.yaml

# CI quality gate
scuttlerun test-scenario.yaml | pincenez checks.yaml | yq -e '.pass_rate >= 0.8'

# Grade a specific output
pincenez checks.yaml output.md > grading.yaml

Development

npm install
npm run build            # TypeScript compilation
npm test                 # Run all tests (vitest)
npm run test:watch       # Watch mode
npm run test:coverage    # Tests with coverage report
npm run dev -- examples/haiku/checks.yaml examples/haiku/transcript.yaml   # Run via tsx

Contributing

See Also