passmuster

v0.1.0

Published

19 days ago

Make LLM output pass muster. A tiny, model-agnostic verify-and-retry loop: generate → check (schema / predicate / LLM-judge) → retry with feedback until it passes.

0High
0Medium
0Low

abdulmunimjemal

llm validation retry guardrails llm-as-judge structured-output zod standard-schema ai agents self-correction openai anthropic

passmuster

Make LLM output pass muster. A tiny, model-agnostic verify-and-retry loop: generate → check → retry with feedback until the output passes — or you're out of attempts. Zero runtime dependencies.

LLM output is often almost right — valid JSON but a missing field, a plausible answer that ignores the question, a plan with a TODO left in. passmuster lets you declare what "good" means as composable checks and makes the model keep trying — feeding each failure back so it can self-correct.

  attempt 1: FAIL -> schema
  attempt 2: FAIL -> no-todos, actionable
  attempt 3: PASS

  ok: true | usedAttempts: 3

Why not instructor / guardrails?

instructor does schema extraction + retry. Great when "valid against a schema" is all you need.
Guardrails is a heavier policy framework (RAIL specs, PII/topic validators).
passmuster is the small piece in between: schema, arbitrary predicate, and LLM-as-judge checks are all the same composable thing, and they all drive one retry-with-feedback loop. Bring your own model — passmuster never calls a provider itself, so it works with any SDK and is trivial to test offline.

Install

npm install passmuster      # zero runtime dependencies

Requires Node ≥ 18. Schema checks work with any Standard Schema library (Zod, Valibot, ArkType) — none is bundled.

Quick start

import { passMuster, schemaCheck, check, judge } from "passmuster";
import { z } from "zod";

const Plan = z.object({ title: z.string().min(1), steps: z.array(z.string()).min(1) });

const result = await passMuster({
  // Called once per attempt. After the first, `feedback` carries the failures.
  generate: async ({ feedback }) => {
    const res = await myModel.complete(buildPrompt(task, feedback?.text));
    return JSON.parse(res);
  },
  checks: [
    schemaCheck(Plan),
    check("no-todos", (p) => !JSON.stringify(p).includes("TODO") || "remove TODO placeholders"),
    judge("actionable", {
      ask: (p) => `Are these steps concrete and actionable? Reply PASS or FAIL: reason.\n${JSON.stringify(p)}`,
      complete: (prompt) => myModel.complete(prompt),
    }),
  ],
  maxAttempts: 3,
});

if (result.ok) use(result.value);
else console.warn("gave up", result.attempts.at(-1)?.failures);

The three kinds of check

All return true to pass, or a reason (string/object) to fail. They're interchangeable in the checks array.

// 1. Schema — any Standard Schema (Zod/Valibot/ArkType)
schemaCheck(Plan);

// 2. Predicate — any (async) function over the value
check("under-budget", (text) => text.length <= 2000 || `too long: ${text.length}`);

// 3. LLM-as-judge — bring your own model
judge("grounded", {
  ask: (answer) => `Is this answer grounded in the source? PASS/FAIL.\n${answer}`,
  complete: (prompt) => anthropic.complete(prompt),   // your call
  // interpret: (resp) => ...   // optional; defaults to PASS/FAIL parsing
});

Retry with feedback

When an attempt fails, passmuster collects every failure and passes it to the next generate call as feedback:

generate: async ({ attempt, feedback }) => {
  const prompt = feedback
    ? `${basePrompt}\n\n${feedback.text}`   // pre-formatted; or use feedback.failures
    : basePrompt;
  return parse(await model.complete(prompt));
};

feedback.text reads like:

The previous output failed these checks:
- [no-todos] remove TODO placeholders
- [actionable] FAIL: step 2 is vague
Fix every issue above and produce a corrected output.

API

`passMuster(options) → Promise<PassMusterResult<T>>`

| Option | Type | Default | | | --- | --- | --- | --- | | generate | (args: { attempt, feedback? }) => T \| Promise<T> | — | Produce a candidate. | | checks | Check<T>[] | — | Verifiers every candidate must pass. | | maxAttempts | number | 3 | Max generate attempts. | | stopOnFirstFailure | boolean | false | Stop checking at the first failure each attempt. | | throwOnFail | boolean | false | Throw PassMusterError instead of returning { ok: false }. | | onAttempt | (attempt) => void | — | Called after each attempt (logging). |

Returns { ok, value, attempts, usedAttempts } — value is the passing candidate, or the last attempt's value if none passed.

Helpers

check(name, fn) — predicate check.
schemaCheck(schema, name?) — validate against a Standard Schema.
judge(name, { ask, complete, interpret? }) — LLM-as-judge check.
buildFeedback(failures) / toMessage(result) — feedback formatting (exposed for custom loops).
PassMusterError — thrown when throwOnFail and no attempt passed; carries .attempts.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

passmuster

Why not instructor / guardrails?

Install

Quick start

The three kinds of check

Retry with feedback

API

passMuster(options) → Promise<PassMusterResult<T>>

Helpers

License

`passMuster(options) → Promise<PassMusterResult<T>>`