passmuster
v0.1.0
Published
Make LLM output pass muster. A tiny, model-agnostic verify-and-retry loop: generate → check (schema / predicate / LLM-judge) → retry with feedback until it passes.
Maintainers
Readme
passmuster
Make LLM output pass muster. A tiny, model-agnostic verify-and-retry loop:
generate → check → retry with feedback until the output passes — or you're out
of attempts. Zero runtime dependencies.
LLM output is often almost right — valid JSON but a missing field, a plausible
answer that ignores the question, a plan with a TODO left in. passmuster lets
you declare what "good" means as composable checks and makes the model keep
trying — feeding each failure back so it can self-correct.
attempt 1: FAIL -> schema
attempt 2: FAIL -> no-todos, actionable
attempt 3: PASS
ok: true | usedAttempts: 3Why not instructor / guardrails?
- instructor does schema extraction + retry. Great when "valid against a schema" is all you need.
- Guardrails is a heavier policy framework (RAIL specs, PII/topic validators).
- passmuster is the small piece in between: schema, arbitrary predicate, and LLM-as-judge checks are all the same composable thing, and they all drive one retry-with-feedback loop. Bring your own model — passmuster never calls a provider itself, so it works with any SDK and is trivial to test offline.
Install
npm install passmuster # zero runtime dependenciesRequires Node ≥ 18. Schema checks work with any Standard Schema library (Zod, Valibot, ArkType) — none is bundled.
Quick start
import { passMuster, schemaCheck, check, judge } from "passmuster";
import { z } from "zod";
const Plan = z.object({ title: z.string().min(1), steps: z.array(z.string()).min(1) });
const result = await passMuster({
// Called once per attempt. After the first, `feedback` carries the failures.
generate: async ({ feedback }) => {
const res = await myModel.complete(buildPrompt(task, feedback?.text));
return JSON.parse(res);
},
checks: [
schemaCheck(Plan),
check("no-todos", (p) => !JSON.stringify(p).includes("TODO") || "remove TODO placeholders"),
judge("actionable", {
ask: (p) => `Are these steps concrete and actionable? Reply PASS or FAIL: reason.\n${JSON.stringify(p)}`,
complete: (prompt) => myModel.complete(prompt),
}),
],
maxAttempts: 3,
});
if (result.ok) use(result.value);
else console.warn("gave up", result.attempts.at(-1)?.failures);The three kinds of check
All return true to pass, or a reason (string/object) to fail. They're
interchangeable in the checks array.
// 1. Schema — any Standard Schema (Zod/Valibot/ArkType)
schemaCheck(Plan);
// 2. Predicate — any (async) function over the value
check("under-budget", (text) => text.length <= 2000 || `too long: ${text.length}`);
// 3. LLM-as-judge — bring your own model
judge("grounded", {
ask: (answer) => `Is this answer grounded in the source? PASS/FAIL.\n${answer}`,
complete: (prompt) => anthropic.complete(prompt), // your call
// interpret: (resp) => ... // optional; defaults to PASS/FAIL parsing
});Retry with feedback
When an attempt fails, passmuster collects every failure and passes it to the
next generate call as feedback:
generate: async ({ attempt, feedback }) => {
const prompt = feedback
? `${basePrompt}\n\n${feedback.text}` // pre-formatted; or use feedback.failures
: basePrompt;
return parse(await model.complete(prompt));
};feedback.text reads like:
The previous output failed these checks:
- [no-todos] remove TODO placeholders
- [actionable] FAIL: step 2 is vague
Fix every issue above and produce a corrected output.API
passMuster(options) → Promise<PassMusterResult<T>>
| Option | Type | Default | |
| --- | --- | --- | --- |
| generate | (args: { attempt, feedback? }) => T \| Promise<T> | — | Produce a candidate. |
| checks | Check<T>[] | — | Verifiers every candidate must pass. |
| maxAttempts | number | 3 | Max generate attempts. |
| stopOnFirstFailure | boolean | false | Stop checking at the first failure each attempt. |
| throwOnFail | boolean | false | Throw PassMusterError instead of returning { ok: false }. |
| onAttempt | (attempt) => void | — | Called after each attempt (logging). |
Returns { ok, value, attempts, usedAttempts } — value is the passing
candidate, or the last attempt's value if none passed.
Helpers
check(name, fn)— predicate check.schemaCheck(schema, name?)— validate against a Standard Schema.judge(name, { ask, complete, interpret? })— LLM-as-judge check.buildFeedback(failures)/toMessage(result)— feedback formatting (exposed for custom loops).PassMusterError— thrown whenthrowOnFailand no attempt passed; carries.attempts.
License
MIT © Abdulmunim Jemal
